<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-bloggers</title>
	<atom:link href="https://www.r-bloggers.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.r-bloggers.com</link>
	<description>R news and tutorials contributed by hundreds of R bloggers</description>
	<lastBuildDate>Fri, 17 Apr 2026 00:00:00 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.5.18</generator>

<image>
	<url>https://i0.wp.com/www.r-bloggers.com/wp-content/uploads/2016/08/cropped-R_single_01-200.png?fit=32%2C32&#038;ssl=1</url>
	<title>R-bloggers</title>
	<link>https://www.r-bloggers.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">11524731</site>	<item>
		<title>Schotter Plots in R</title>
		<link>https://www.r-bloggers.com/2026/04/schotter-plots-in-r/</link>
		
		<dc:creator><![CDATA[Jonathan Carroll]]></dc:creator>
		<pubDate>Fri, 17 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://jcarroll.com.au/2026/04/17/schotter-plots-in-r/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Translating things between languages reveals how each language approaches<br />
different design trade-offs, and I believe it’s a useful exercise. Having<br />
something to translate is the first step. I found a plot I wanted to generate,<br />
and some code that reproduced it, so off we go!<br />
I don’t recall ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/schotter-plots-in-r/">Schotter Plots in R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://jcarroll.com.au/2026/04/17/schotter-plots-in-r/"> rstats on Irregularly Scheduled Programming</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Translating things between languages reveals how each language approaches
different design trade-offs, and I believe it’s a useful exercise. Having
something to translate is the first step. I found a plot I wanted to generate,
and some code that reproduced it, so off we go!</p>
<p>I don’t recall how I originally found <a href="https://zellyn.com/2024/06/schotter-1/" rel="nofollow" target="_blank">this page</a>
(I didn’t keep a note on it, it seems) and this has been sitting on my
to-be-posted-about pile for way too long, so here’s the post I’ve been meaning to write.</p>
<p>That post details some ALGOL code that generates Georg Nees’ “Schotter”
computer-generated art from 1968 which shows a grid of squares which get
increasingly displaced in position and rotation</p>
<pre>1   &#39;BEGIN&#39;&#39;COMMENT&#39;SCHOTTER.,
2   &#39;REAL&#39;R,PIHALB,PI4T.,
3   &#39;INTEGER&#39;I.,
4   &#39;PROCEDURE&#39;QUAD.,
5   &#39;BEGIN&#39;
6   &#39;REAL&#39;P1,Q1,PSI.,&#39;INTEGER&#39;S.,

7   JE1.=5*1/264.,JA1.=-JE1.,
8   JE2.=PI4T*(1+I/264).,JA2.=PI4T*(1-I/264).,
9   P1.=P+5+J1.,Q1.=Q+5+J1.,PS1.=J2.,
10  LEER(P1+R*COS(PSI),Q1+R*SIN(PSI)).,
11  &#39;FOR&#39;S.=1&#39;STEP&#39;1&#39;UNTIL&#39;4&#39;DO&#39;
12  &#39;BEGIN&#39;PSI.=PSI+PIHALB.,
13  LINE(P1+R*COS(PSI),Q1+R*SIN(PSI)).,
14  &#39;END&quot;.,I.=I+1
15  &#39;END&#39;QUAD.,
16  R.=5*1.4142.,
17  PIHALB.=3.14159*.5.,P14T.=PIHALB*.5.,
18  I.=0.,
19  SERIE(10.0,10.0,22,12,QUAD)
20  &#39;END&#39; SCHOTTER.,

1   &#39;REAL&#39;P,Q,P1,Q1,XM,YM,HOR,VER,JLI,JRE,JUN,JOB.,
5   &#39;INTEGER&#39;I,M,M,T.,
7   &#39;PROCEDURE&#39;SERIE(QUER,HOCH,XMAL,YMAL,FIGUR).,
8   &#39;VALUE&#39;QUER,HOCH,XMAL,YMAL.,
9   &#39;REAL&#39;QUER,HOCH.,
10  &#39;INTEGER&#39;XMAL,YMAL.,
11  &#39;PROCEDURE&#39;FIGUR.,
12  &#39;BEGIN&#39;
13  &#39;REAL&#39;YANF.,
14  &#39;INTEGER&#39;COUNTX,COUNTY.,
15  P.=-QUER*XMAL*.5.,
16  Q.=YANF.=-HOCH*YMAL*.5.,
17  &#39;FOR&#39;COUNTX.=1&#39;STEP&#39;1&#39;UNTIL&#39;XMAL&#39;DO&#39;
18  &#39;BEGIN&#39;Q.=YANF.,
19  &#39;FOR&#39;COUNTY.=1&#39;STEP&#39;1&#39;UNTIL&#39;YMAL&#39;DO&#39;
20  &#39;BEGIN&#39;FIGUR.,Q.=Q+HOCH
21  &#39;END&#39;.,P.=P+QUER
22  &#39;END&#39;.,
23  LEER(-148.0,-105.0).,CLOSE.,
24  SONK(11).,
25  OPBEN(X,Y)
26  &#39;END&#39;SERIE.,</pre>
<div class="float">
<img src="https://i1.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/images/schotter.jpg?w=578&#038;ssl=1" alt="Schotter" data-recalc-dims="1" />
<div class="figcaption">Schotter</div>
</div>
<p>What’s missing from this ALGOL code is the seeds needed to reproduce the plot.
<a href="https://zellyn.com/2024/06/schotter-2/" rel="nofollow" target="_blank">The author went down a rabbit hole</a>
investigating and calculating different values, but managed to determine them to
be “(1922110153) for the x- and y-shift seed, and (1769133315) for the rotation
seed”. They also provided a translation into Python</p>
<pre>import math
import drawsvg as draw

class Random:
    def __init__(self, seed):
        self.JI = seed

    def next(self, JA, JE):
        self.JI = (self.JI * 5) % 2147483648
        return self.JI / 2147483648 * (JE-JA) + JA

def draw_square(g, x, y, i, r1, r2):
    r = 5 * 1.4142
    pi = 3.14159
    move_limit = 5 * i / 264
    twist_limit = pi/4 * i / 264

    y_center = y + 5 + r1.next(-move_limit, move_limit)
    x_center = x + 5 + r1.next(-move_limit, move_limit)
    angle = r2.next(pi/4 - twist_limit, pi/4 + twist_limit)

    p = draw.Path()
    p.M(x_center + r * math.sin(angle), y_center + r * math.cos(angle))
    for step in range(4):
        angle += pi / 2
        p.L(x_center + r * math.sin(angle), y_center + r * math.cos(angle))
    g.append(p)

def draw_plot(x_size, y_size, x_count, y_count, s1, s2):
    r1 = Random(s1)
    r2 = Random(s2)
    d = draw.Drawing(180, 280, origin=&#39;center&#39;, style=&quot;background-color:#eae6e2&quot;)
    g = draw.Group(stroke=&#39;#41403a&#39;, stroke_width=&#39;0.4&#39;, fill=&#39;none&#39;,
                   stroke_linecap=&quot;round&quot;, stroke_linejoin=&quot;round&quot;)

    y = -y_size * y_count * 0.5
    x0 = -x_size * x_count * 0.5
    i = 0

    for _ in range(y_count):
        x = x0
        for _ in range(x_count):
            draw_square(g, x, y, i, r1, r2)
            x += x_size
            i += 1
        y += y_size
    d.append(g)
    return d
  
d = draw_plot(10.0, 10.0, 12, 22, 1922110153, 1769133315).set_render_size(w=500) 
print(d.as_svg())</pre>
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="450" viewBox="-90.0 -140.0 180 280" style="background-color:#eae6e2">
<defs>
</defs>
<g stroke="#41403a" stroke-width="0.4" fill="none" stroke-linecap="round" stroke-linejoin="round">
<path d="M-50.00005126718646,-100.00004463327559 L-50.00003799937353,-109.99994209889388 L-59.99993546496541,-109.99996863451973 L-59.999975268404185,-100.00007116897186 L-50.00007780291792,-100.00001809772016" />
<path d="M-40.00071025565693,-99.98843209578396 L-40.00637454536543,-109.98832795718559 L-50.0062704142736,-109.98267693528788 L-50.00063266018671,-99.98278105889077 L-40.00073677631825,-99.98840554516684" />
<path d="M-29.995077284756096,-100.0634825084743 L-30.051839290370896,-110.06321887488785 L-40.05157573208719,-110.00647013687228 L-39.994840261670944,-100.00673361987086 L-29.995103669401985,-100.06345582268767" />
<path d="M-19.93165959997083,-99.98229914682717 L-20.000467422836092,-109.98195988140051 L-30.000128248694487,-109.91316532603413 L-29.931346960827128,-99.91350440890828 L-19.93168595245141,-99.98227242927649" />
<path d="M-10.084121639092759,-100.06716104811655 L-10.066925666552578,-110.06704372850753 L-20.06680832411923,-110.084252968841 L-20.084030832245965,-100.08437033411629 L-10.084148220380783,-100.06713455819633" />
<path d="M-0.13979567953820027,-100.0395478025014 L-0.03239251926518261,-110.03886847363795 L-10.031713047890957,-110.1462849009585 L-10.139142742258867,-100.1469645148611 L-0.13982249870743235,-100.03952155344602" />
<path d="M9.961814533928456,-100.095005878406 L9.892189703031272,-110.09466095775636 L-0.10746546868816687,-110.02504939435057 L-0.03786717277388352,-100.02539413027966 L9.96178818363088,-100.09497915870232" />
<path d="M19.991755502754366,-99.88946345415374 L20.00203764844921,-109.88935563359922 L10.002145482654822,-109.89965104709998 L9.991836801348196,-99.89975889497428 L19.991728939787574,-99.8894369458618" />
<path d="M29.883204608721726,-99.98825493069629 L29.941896852088306,-109.98798015410236 L19.942171706563602,-110.04668566505327 L19.88345292802844,-100.04696059742754 L29.883177917737594,-99.98822855130825" />
<path d="M40.03212795532069,-100.2540607286853 L39.82673399590962,-110.25184861623787 L29.828945835850206,-110.04646792184093 L30.034313265232655,-100.04867948929225 L40.03210197025298,-100.25403365365985" />
<path d="M49.93974139009356,-99.98288224729943 L49.98850696549987,-109.98266080693315 L39.988728470576866,-110.03143964999458 L39.9399363598604,-100.0316612197999 L49.93971472530916,-99.98285584142849" />
<path d="M59.71370846977368,-100.06453324132968 L59.98182835691243,-110.06083561016467 L49.98552634382635,-110.32896876034616 L49.717379930602604,-100.33266710302657 L59.7136812321381,-100.06450742676104" />
<path d="M-50.02799835268205,-89.89399240349655 L-49.993390940227165,-99.89382998468048 L-59.993228475485346,-99.9284506648688 L-60.02786242340704,-89.92861317555396 L-50.028024980053154,-89.89396595989895" />
<path d="M-39.88981906362492,-89.96476065432375 L-39.70243144001625,-99.96290224171756 L-49.700572778776305,-100.1503031308093 L-49.88798693335079,-90.15216204070059 L-39.88984609191102,-89.9647346206437" />
<path d="M-29.878082110762897,-89.87067035095136 L-29.702068839857706,-99.86901864712111 L-39.70041690248514,-100.04504518378363 L-39.8764567049047,-90.04669735471612 L-29.878109109414694,-89.87064428653983" />
<path d="M-20.062220141751116,-89.98227866122532 L-20.011789240844593,-99.98204896095166 L-30.01155947365066,-100.03249312950234 L-30.06201690984541,-90.03272296363413 L-20.062246810932677,-89.98225225979542" />
<path d="M-9.9080751481311,-90.06420197078735 L-9.639199105259333,-100.06048402946419 L-19.635480807184027,-100.32937333535175 L-19.904383376086887,-90.3330919901968 L-9.908102387719303,-90.06417615827912" />
<path d="M0.05366425530912,-89.87142795637658 L0.47057590225859425,-99.86263078695174 L-9.52062637515152,-100.27955569017777 L-9.93756453465338,-90.28835396595028 L0.05363663637389404,-89.87140255017333" />
<path d="M10.203500405486142,-89.91995445158382 L10.26929096206511,-99.91963549278309 L0.2696100081652135,-99.98543931688778 L0.20379291653493237,-89.98575845030486 L10.203473695783266,-89.91992809114909" />
<path d="M19.588370553722395,-90.02045539819989 L19.93547450108363,-100.01432692893218 L9.941603430895881,-100.36144413611085 L9.594472963900381,-90.36757352648524 L19.588343112946262,-90.02042979967352" />
<path d="M30.14444694864817,-90.00574105106723 L29.591933792565776,-99.99036315385165 L19.607310956718518,-99.43786324531538 L20.15979761770766,-89.45323967642287 L30.144421919627888,-90.00571308986393" />
<path d="M40.1209431203973,-90.06190665068318 L39.717790485837014,-100.05367412577498 L29.726022475853156,-99.65053474824114 L30.129148596359833,-89.65876620338281 L40.12091767607505,-90.06187906686162" />
<path d="M49.40335995567363,-90.1545933056903 L49.90872880846756,-100.14171259233552 L39.92161019235194,-100.6470946959879 L39.41621483784199,-90.65997675041947 L49.40333211284565,-90.15456814505285" />
<path d="M59.93735565840686,-90.06926350037787 L59.84316435289939,-100.06871735151503 L49.843710376798484,-99.97453931323199 L49.93787514785694,-89.97508521218495 L59.937329373832526,-90.06923671601865" />
<path d="M-49.888992673681855,-79.91381423622045 L-50.38029093122906,-89.90163558750552 L-60.36811293435737,-89.41035058174916 L-59.876841180392695,-79.4225279267952 L-49.88901787363067,-79.91378642896731" />
<path d="M-39.77181953909958,-79.51951089433379 L-39.35691011709312,-89.51079707252853 L-49.34819574477937,-89.92571975092214 L-49.7631316795594,-79.93443467376198 L-39.77184715294291,-79.51948548259628" />
<path d="M-30.69330496202617,-79.38579052075666 L-30.082571697024765,-89.36702061539309 L-40.06380098133462,-89.97776712343911 L-40.67456073242418,-79.99653864947338 L-30.693333068820166,-79.38576565534136" />
<path d="M-19.58085040573569,-79.94019788964178 L-19.62077301364069,-89.94001566340312 L-29.62059084036234,-89.90010632320534 L-29.58069476787186,-79.90028844354099 L-19.580876835282403,-79.94017124832409" />
<path d="M-10.152137135557833,-80.15555581610177 L-10.359184532552586,-90.15330959874979 L-20.356938589901237,-89.94627546672392 L-20.149917722844606,-79.94852113469223 L-10.152163116147491,-80.15552873677926" />
<path d="M-0.33701257548168506,-79.90406770528523 L0.31565816808809455,-89.88264326613275 L-9.662916526790422,-90.53532724922508 L-10.31561374940416,-80.55675342033312 L-0.3370407865163374,-79.90404295819913" />
<path d="M10.095626362391236,-80.09667897707732 L9.90391528859146,-90.09473859818569 L-0.09414458686938598,-89.90304078976057 L0.09753995618075084,-79.90498065996485 L10.09560034029374,-80.09665193763965" />
<path d="M19.57430375048677,-79.58806465581777 L20.427118621243082,-89.5515307253777 L10.463653683202342,-90.4043588156092 L9.610812373497161,-80.44089500910518 L19.574275048446914,-79.58804047992783" />
<path d="M30.181268807686408,-79.61179185211665 L30.77988490372078,-89.59375599213558 L20.797921557951426,-90.19238533218851 L20.199278973881,-80.21042278068629 L30.181240731098505,-79.61176695259942" />
<path d="M40.52336115424119,-80.41196119113626 L39.68580185100374,-90.37672128541928 L29.721040645460068,-89.53917520337541 L30.558573506308896,-79.57441288658865 L40.523336934321215,-80.41193252624095" />
<path d="M50.70668344133314,-79.50935556530763 L50.43284805136809,-89.5055029983069 L40.43670025505422,-89.23168087117948 L40.710509119343506,-79.23553271156862 L50.70665764223378,-79.5093283130193" />
<path d="M59.7816585196118,-79.38423650752962 L60.453944980605385,-89.36150969399795 L50.47667268613208,-90.03380939278608 L49.80435974955058,-80.05653799032538 L59.781630259981135,-79.38421181595169" />
<path d="M-49.01156803874121,-69.76426059473769 L-49.834550526348394,-79.73023516565382 L-59.80052618918472,-78.90726590085157 L-58.97757014718883,-68.9412891461126 L-49.01159230056475,-69.76423196530067" />
<path d="M-40.24072750484152,-69.66713098273362 L-40.07185395773892,-79.66560241831537 L-50.0703251692513,-79.83448923133871 L-50.239225248195055,-69.83601824391329 L-40.240754484874195,-69.66710489904939" />
<path d="M-30.12711222270257,-68.94508417540493 L-29.261023839355065,-78.90740520191704 L-39.22334371673678,-79.77350680322049 L-40.08945853599462,-69.81118807498673 L-30.127140956926354,-68.945060037776" />
<path d="M-20.43630857642466,-69.90718216504658 L-20.627313240648487,-79.90525530647237 L-30.625386635489495,-79.71426390764113 L-30.43440850205118,-69.71619025940251 L-20.436334600432545,-69.90715512744757" />
<path d="M-9.510267987465628,-70.20508003237234 L-10.488327292290066,-80.15703206799337 L-20.440280625586425,-79.17898596736896 L-19.46224772916378,-69.2270313364148 L-9.510291800569307,-70.20505102863396" />
<path d="M0.7037446989713549,-69.94471263019955 L0.5620487713033944,-79.94360614834132 L-9.436844934831004,-79.80192348715437 L-9.295175540125328,-69.80302959304494 L0.7037185419415293,-69.9446857212691" />
<path d="M10.662629268798504,-70.31317013608346 L9.937427843212802,-80.2867368108184 L-0.036139793706921,-79.5615486181107 L0.6890351661214957,-69.58798001902377 L10.662604727358094,-70.31314174597163" />
<path d="M20.417297459520913,-70.63610212028384 L19.26268700311582,-80.569118984108 L9.329668607369138,-79.41452170678 L10.484252705618008,-69.48150177912828 L20.417274165157266,-70.636072698296" />
<path d="M30.426997598921773,-71.00114368409805 L29.74289310934961,-80.97761351918513 L19.76642236660497,-80.29352226634283 L20.450500382716196,-70.3170506159582 L30.426972940723278,-71.00111539533715" />
<path d="M41.162330656043956,-69.93381953730288 L40.339674810797284,-79.89982107670355 L30.373672179909807,-79.07717845429758 L31.196301579473616,-69.11117473194085 L41.16230639328207,-69.93379090866105" />
<path d="M49.789388506957,-68.96897474006617 L51.05045113608405,-78.88903877052859 L41.130388778801816,-80.15011456154485 L39.86929982589855,-70.23005387746028 L49.78935883676801,-68.96895176267223" />
<path d="M58.83327640684147,-69.01696378959564 L59.69879692905628,-78.97933416760709 L49.736427699421796,-79.84486790784331 L48.87088074116569,-69.88249982660335 L58.83324767399359,-69.01693965032888" />
<path d="M-49.47723692115735,-60.69848840497862 L-50.760450362570396,-70.61571143296544 L-60.67767509310953,-69.33251114967442 L-59.39448796794281,-59.41528471660043 L-49.477259832351415,-60.6984586836406" />
<path d="M-39.76628895379624,-60.77715729794572 L-40.502600008840965,-70.74990994469647 L-50.475353632516686,-70.01361212144971 L-49.739069041069186,-60.040857520866574 L-39.76631346359619,-60.777128880513516" />
<path d="M-28.917713319430735,-61.02411328466083 L-29.702093089424643,-70.99320038927975 L-39.67118123474587,-70.20883384622043 L-38.88682791862251,-60.23974466021445 L-28.917737691949323,-61.024084749400465" />
<path d="M-18.934213321033265,-60.77316939426913 L-19.90273817400964,-70.72605384714119 L-29.855623911906765,-69.75754219960196 L-28.887125469806392,-59.80465517669729 L-18.93423716191169,-60.7731404133571" />
<path d="M-10.992346017384452,-60.27362525911041 L-9.753008370567382,-70.19642685544585 L-19.67580832254722,-71.43577766778434 L-20.91517230040496,-61.51297936017757 L-10.992375637188724,-60.273602216802765" />
<path d="M-0.49507286999073674,-59.68245424325132 L-0.46990591878788646,-69.6823200397319 L-10.469771681868288,-69.70750025870564 L-10.494965168612861,-59.70763452904303 L-0.49509947238564056,-59.68242777452766" />
<path d="M9.91991242100088,-60.34858837160502 L10.048063722112587,-70.34766465729653 L0.04898760646036404,-70.47582922513152 L-0.07919022809767018,-60.47675327953619 L9.919885547423158,-60.348562178255335" />
<path d="M19.26713110298649,-59.76385642056739 L19.91925393103816,-69.74246780393077 L9.940643412916813,-70.39460387155252 L9.288494105726123,-60.415994218690756 L19.267102893310724,-59.76383167193228" />
<path d="M30.52784120713736,-59.9003283199372 L30.518087354486553,-69.9002210286323 L20.518194632858886,-69.89048044378812 L20.52792194989643,-59.8905877092455 L30.52781469733641,-59.90030175847639" />
<path d="M40.37896477241583,-59.52229714956463 L40.32927057025239,-69.52207113747687 L30.32949651641493,-69.47239020296259 L30.379164183280004,-59.47261608321751 L40.37893836891508,-59.52227048243327" />
<path d="M50.34786961768798,-60.52285657360413 L50.09501114409041,-70.5195566248903 L40.0983107573217,-70.26671141486358 L40.35114270377715,-60.27001069262993 L50.34784376145814,-60.52282937551365" />
<path d="M59.56872213036472,-61.782925932929295 L58.28598441845441,-71.7002105051042 L48.36869814435839,-70.41748595139758 L49.65140953985906,-60.500197975397896 L59.56869921774495,-61.78289621269036" />
<path d="M-49.25521468366633,-50.40791517674687 L-48.65613976626336,-60.38985179077091 L-58.63807558542907,-60.98893995215592 L-59.23717699079505,-51.007004927866106 L-49.25524276139871,-50.407890278520206" />
<path d="M-40.67217612814775,-49.56624961539246 L-41.25423034343075,-59.54919317996931 L-51.237174680265376,-58.967152210005395 L-50.655146955621575,-48.98420710093057 L-40.67220107432412,-49.56622158025323" />
<path d="M-31.15108770889861,-49.80373258865845 L-30.422490856691105,-59.77705179291579 L-40.395809094241095,-60.50566187767168 L-41.12443241154408,-50.532344606846614 L-31.151116107461494,-49.80370805699781" />
<path d="M-20.535708294622626,-49.1787634895236 L-20.57918680728025,-59.17856643468168 L-30.578989810116596,-59.135101189711605 L-30.535537832834155,-49.13529812921459 L-20.53573471469408,-49.178736838809336" />
<path d="M-9.500434218456487,-50.518481761139775 L-9.721314312606335,-60.51593950333921 L-19.718772347859556,-60.29507267376547 L-19.497918782862307,-50.29761434547607 L-9.50046016155432,-50.51845464589643" />
<path d="M0.5920194124193161,-51.07087648272763 L-0.5274312154964891,-61.00791727034212 L-10.464473488383613,-59.8884798268423 L-9.34504922930176,-49.95143606870003 L0.5919960140781555,-51.07084714336196" />
<path d="M9.92028315362204,-50.63389413145994 L8.161076464870709,-60.477833441985354 L-1.682865179752504,-58.71863981412402 L0.07631538721575559,-48.87469583542034 L9.920261699982566,-50.633863341492415" />
<path d="M20.950241234424745,-49.92342854082978 L19.94874320195331,-59.87304918756858 L9.999121226440792,-58.871564356203926 L11.000592856696908,-48.9219410519352 L20.950217489704325,-49.92339948108101" />
<path d="M29.65938383957851,-50.71576468216506 L28.61641230737596,-60.66112338389473 L18.671052221845727,-59.618165047144274 L19.713997363142273,-49.672803577831 L29.659360216221085,-50.71573552367178" />
<path d="M40.29321051895712,-49.64028073066216 L39.10026148111802,-59.56876610941828 L29.17177451957195,-58.37583024464396 L30.364697211279385,-48.44734170032541 L40.29318733835293,-49.64025121896386" />
<path d="M52.024354329945204,-49.740199880729136 L50.13503259260701,-59.55999685907225 L40.315233107530105,-57.67068815059224 L42.20452878714861,-47.850886158798815 L52.02433328564125,-49.7401688095525" />
<path d="M60.63764231396416,-50.07567008983329 L59.44978625764376,-60.00476608657529 L49.52068868486914,-58.81692320412983 L50.708518393437565,-48.887824055340076 L60.637619118224976,-50.07564059002939" />
<path d="M-49.924942659880294,-40.54749947908552 L-51.669271999955754,-50.39408591685202 L-61.51586075208085,-48.649769641178736 L-59.7715575408128,-38.80317857471237 L-49.92496416002251,-40.54746872157211" />
<path d="M-38.804613102323614,-40.98868132596743 L-39.006020167055844,-50.98655032477567 L-49.00388943308116,-50.78515652516518 L-48.80250889859276,-40.7872869919404 L-38.804639098186115,-40.988654261306344" />
<path d="M-29.77798366617838,-40.58797711817659 L-30.79716424851113,-50.535802019312065 L-40.744990501881446,-49.51663463570352 L-39.7258363169989,-39.56880703011588 L-29.778007359211443,-40.58794801627063" />
<path d="M-20.60115757951953,-41.69321304925731 L-21.311921710108564,-51.667818990031265 L-31.286528593911964,-50.957068093699114 L-30.575790931837947,-40.982460266883834 L-20.601182162028337,-41.693184694698466" />
<path d="M-11.359802889021495,-38.58750447414553 L-10.443448978619703,-48.54532777213323 L-20.401271060784936,-49.46169489452335 L-21.317651395161768,-39.5038740281981 L-11.359831744694034,-38.58748048183617" />
<path d="M0.9963765233581228,-38.59393963099782 L1.0634610270413258,-48.59361207525422 L-8.936211328198892,-48.66070984645173 L-9.003322366910592,-38.66103758024529 L0.9963498102444603,-38.59391327401953" />
<path d="M9.322668408674204,-40.52492846316482 L9.662330332801476,-50.51905569371144 L-0.3317964470746322,-50.85873087799541 L-0.6714848915147176,-40.8646045488074 L9.322640986967603,-40.52490284421181" />
<path d="M18.263906483189043,-38.78069087780473 L19.975792610081854,-48.63296955898337 L10.123516200233666,-50.34486875782775 L8.411603929440734,-40.49259461932735 L18.263875796575995,-38.78066927658888" />
<path d="M31.18732832229803,-41.472805126272966 L30.37896795216725,-51.439976381702316 L20.411795624218257,-50.63162923596421 L21.22012954556228,-40.66445583551312 L31.18730401849792,-41.47277653246163" />
<path d="M38.38741442182283,-38.672549523793904 L39.10937262545023,-48.64635149071054 L29.135571616432827,-49.36832292752685 L28.41358694642887,-39.394522876426244 L38.38738603959512,-38.67252497323592" />
<path d="M50.28740695081568,-40.7034127017291 L49.06899206631963,-50.62880511171943 L39.14359803975139,-49.41040339618445 L40.361986586323155,-39.48500775305577 L50.2873838459948,-40.703383130662175" />
<path d="M59.66998839907298,-42.098588402296876 L58.42984112862913,-52.02128884475032 L48.50713904076335,-50.78115473969588 L49.747259980426215,-40.858451006435224 L59.669965359064264,-42.09855878070434" />
<path d="M-50.184608262793475,-32.11721655170345 L-51.47476541589325,-42.033538644261796 L-61.39108922021679,-40.743394648088696 L-60.10095838097264,-30.827069132017414 L-50.184631153171075,-32.117186814330346" />
<path d="M-39.92651105152822,-31.563288375129822 L-41.40946966030007,-41.45261510083388 L-51.29879835357734,-39.969669613171575 L-49.81586598702718,-30.080338952338508 L-39.92653335865572,-31.56325819777391" />
<path d="M-28.15893101929954,-31.18145638517782 L-30.548506274247007,-40.891650998263486 L-40.25870405780038,-38.502088626754784 L-37.86915456973464,-28.791887672750796 L-28.158950445297137,-31.181424277369363" />
<path d="M-18.928091362379945,-30.29989092552472 L-20.769609019607,-40.12876482039061 L-30.59848535778046,-38.2872602040649 L-28.75699378235934,-28.458381422601157 L-18.928112557622626,-30.299859957114453" />
<path d="M-11.262862182829146,-28.282067821481448 L-10.160186552984745,-38.220984049038776 L-20.09910131750893,-39.32367286578553 L-21.2018033211561,-29.38475956431199 L-11.262891482750698,-28.282044373766347" />
<path d="M-0.5102380292098898,-29.928672881056187 L-0.21836970593129124,-39.924310039582984 L-10.214006477199888,-40.21619162502169 L-10.5059013247982,-30.220555241028894 L-0.5102653280987903,-29.928647131271497" />
<path d="M10.282649991974344,-27.741036453570914 L11.753192071592725,-37.632217137145894 L1.8620133391341778,-39.10277234033103 L0.39144501238487095,-29.21159555900633 L10.282619842558312,-27.741014108695442" />
<path d="M18.4094191552526,-30.56082468664115 L20.442139093311937,-40.351943058741 L10.65102341822315,-42.38467598760444 L8.618277498559191,-32.59356300954394 L18.409387779574157,-30.560804099083043" />
<path d="M28.871835081289575,-29.19903405299598 L28.27168900050575,-39.180906322504086 L18.28981593473567,-38.58077348561798 L18.88993552772301,-28.5988996236035 L28.871810185964318,-29.199005972691015" />
<path d="M37.97069812303263,-29.04669710338643 L40.44299400318284,-38.73616057687388 L30.75353380993348,-41.208469312952595 L28.2812122179306,-31.519012399958374 L37.970665850652615,-29.046677952035687" />
<path d="M50.91527402372233,-28.99768517891735 L52.349721547969814,-38.894164837019794 L42.45324379309374,-40.32862549186471 L41.0187700076539,-30.432149640232424 L50.91524395602498,-28.997662724200197" />
<path d="M58.4997708268342,-27.944705303376352 L60.11383087765164,-37.81348230336021 L50.245056019203105,-39.427555448019184 L48.63096978070543,-29.55878273112335 L58.49974035603119,-27.94468339878982" />
<path d="M-52.68936161911645,-18.460658970995695 L-50.235572245766264,-28.154825620805713 L-59.929735639892534,-30.608627856324574 L-62.38355073757576,-20.91446771789912 L-52.689393854868186,-18.460639758055862" />
<path d="M-40.525975184875186,-18.801544212734946 L-39.545799370130624,-28.753288014463614 L-49.497541871358266,-29.733477033130203 L-50.477744093945155,-19.78173583242111 L-40.526004193772124,-18.801520405915646" />
<path d="M-29.880949519788196,-20.513720734566704 L-30.751455089098364,-30.475656760101856 L-40.7133922696071,-29.60516440823833 L-39.84291313519175,-19.643226072773558 L-29.88097364478847,-20.513691989739197" />
<path d="M-20.801704477027297,-20.815153295696227 L-19.319334589476167,-30.70456828544849 L-29.208747612418925,-32.186951294223654 L-30.691143742415512,-22.29754023810781 L-20.801734653143992,-20.815130986892413" />
<path d="M-10.972300268788263,-19.30851333243507 L-9.414236878710927,-29.1862857430323 L-19.29200722206891,-30.74436223888629 L-20.85009682369681,-20.866593962784943 L-10.972330614869488,-19.30849125538588" />
<path d="M-0.5403295822210583,-19.816747462913995 L1.3393564763032781,-29.63839343721201 L-8.482287004028219,-31.51809252704445 L-10.36199912516545,-21.696451540696753 L-0.5403606328188535,-19.81672638825804" />
<path d="M11.860766787495768,-22.843174578857848 L9.300008787906636,-32.50963555434393 L-0.3664555851715843,-29.948890380168148 L2.194276763586349,-20.282422609514803 L11.860747931797796,-22.843142132850378" />
<path d="M19.746186813555525,-20.554550339532742 L18.928914127239295,-30.520994774699005 L8.962468607728574,-29.70373531181111 L9.7797148471867,-19.737288707973477 L19.7461625353337,-20.55452172400039" />
<path d="M28.332052386451974,-18.40445422739958 L30.37479111295707,-28.193487265360552 L20.58576078530003,-30.236238979902932 L18.54299608272397,-20.447211362567057 L28.332020989721443,-18.404433671960906" />
<path d="M39.21042307261726,-19.912313015505408 L37.116157113922156,-29.69045186793055 L27.33801548284424,-27.59619888282202 L29.43225549436251,-17.818054473108532 L39.21040268269435,-19.912281511032862" />
<path d="M49.53932361514691,-19.801328844193307 L51.490602500773605,-29.609002372281534 L41.68293156164087,-31.560294270677687 L39.73162665047869,-21.752625920517726 L49.539292411648624,-19.801307996592943" />
<path d="M60.40413933875045,-22.55807025079512 L57.62054842926875,-32.16273427259755 L48.015880714220565,-29.379156106537504 L50.799446136854044,-19.774484698260466 L60.40412123834303,-22.55803737746249" />
<path d="M-48.837825910737514,-10.633808748158675 L-50.21771739387009,-20.53804263159627 L-60.12195310813194,-19.15816428935188 L-58.74208790677817,-9.253926744283202 L-48.8378485309201,-10.633778804743933" />
<path d="M-39.0196165153297,-11.760523744341082 L-39.519680123690286,-21.74791007464034 L-49.5070671174626,-21.247859717493412 L-49.007030011530205,-11.26047206026563 L-39.01964169086453,-11.760495914982606" />
<path d="M-26.99110629953718,-11.856955195485721 L-29.488743835960193,-21.539917348654857 L-39.17170930297354,-19.042292659538813 L-36.67409746116886,-9.35932387869829 L-26.99112536651822,-11.856922873187216" />
<path d="M-22.29545191337334,-8.498553067395122 L-21.823418580730408,-18.48730343247729 L-31.812168319512367,-18.959350018142818 L-32.28422815819965,-8.970600905678651 L-22.29547967207087,-8.498527813970428" />
<path d="M-12.220058394295894,-10.838115912792105 L-9.860266856881568,-20.55559157383715 L-19.57773938695869,-22.91539600434652 L-21.937556710558937,-13.19792660525442 L-12.220090442468974,-10.838096388567447" />
<path d="M-0.6106531094494274,-11.017571691381933 L-2.0350092020275943,-20.915508801941414 L-11.932948202406756,-19.49116584189695 L-10.508618374898496,-9.593224951715525 L-0.6106755949322382,-11.017541646685075" />
<path d="M8.359321695200709,-12.314891846908907 L7.932981970959886,-22.30569680760098 L-2.057823555388765,-21.879370339109567 L-1.6315103426475144,-11.888564247121927 L8.35929631496153,-12.314864204112268" />
<path d="M22.23169014636191,-11.79783850806203 L20.097138239759982,-21.567262108309755 L10.327711807408281,-19.432723163731083 L12.462237789959943,-9.6632938992926 L22.231669886468005,-11.797806919813494" />
<path d="M29.840338428373386,-6.7005924191130415 L32.752249681129456,-16.267134227892623 L23.185711735867283,-19.179058173484982 L20.273775097443764,-9.612524091757061 L29.84030531562059,-6.700574760507505" />
<path d="M37.752934802168916,-10.769022980056768 L38.72945674655354,-20.721125984708117 L28.777355037555278,-21.697661133491355 L27.800806684375143,-11.74556072016371 L37.75290580201465,-10.76899916258839" />
<path d="M51.88279271918266,-10.954588550307198 L49.80919080596119,-20.7371303686967 L40.026646236335864,-18.663541434903607 L42.10022219069692,-8.880994114059677 L51.882772262742236,-10.954557088985053" />
<path d="M62.92914790262257,-11.680454852829667 L59.545353871268176,-21.09044354976366 L50.13536068474181,-17.70666200353722 L53.519129745834334,-8.296664327435057 L62.929131911495745,-11.680420903387711" />
<path d="M-46.440492854604926,-3.144686172558498 L-49.51487811676356,-12.660257862314026 L-59.030453885589395,-9.585885225370053 L-55.95609387386549,-0.0703053774906719 L-46.44050994694931,-3.1446527639890927" />
<path d="M-37.284987223694074,1.5281656393096115 L-38.63729269457436,-8.379872473075825 L-48.54533260118308,-7.0275801481311255 L-47.19305342217635,2.8804615526834225 L-37.28500992717341,1.5281955196170451" />
<path d="M-28.496369049883135,1.473609948295076 L-28.076003830199735,-8.517448174102954 L-38.06706139485054,-8.937826649870914 L-38.48745312670232,1.0532303570150816 L-28.49639667759872,1.4736353449499413" />
<path d="M-19.786605916572608,0.2835717087285268 L-17.682817254024457,-9.492522735700135 L-27.45890890714845,-11.596324369118655 L-29.562723511433635,-1.820235507316534 L-19.786637440970598,0.2835920678316066" />
<path d="M-10.06206857293809,3.597532292923229 L-6.815647024267648,-5.860728459000633 L-16.273903468847642,-9.107162556840313 L-19.520350115850846,0.3510895803791585 L-10.062102286008539,3.597548776540177" />
<path d="M-2.149234553733667,0.5745471534435858 L-0.3686395955251687,-9.265545880560595 L-10.208730267036383,-11.046153894552747 L-11.989351336809051,-1.2060655855518156 L-2.1492653903357297,0.5745685399982312" />
<path d="M8.10798739875164,-0.46776428623223865 L9.627436816593727,-10.351550139318821 L-0.2563470204864222,-11.871012670916201 L-1.7758226658364178,-1.9872308498598832 L8.107957139178666,-0.467742090759943" />
<path d="M21.72991516729038,-0.5380050331673543 L21.860778368666693,-10.537046194477806 L11.861737380993672,-10.66792266253079 L11.73084764626425,-0.6688818485127968 L21.72988828660961,-0.5379788471071709" />
<path d="M31.272243121565367,-0.3286109305491678 L31.931198620982837,-10.306773435381537 L21.953036990458045,-10.965742173773515 L21.294055013092713,-0.9875814175738657 L31.27221489494966,-0.3285862012363463" />
<path d="M39.63529978638704,-0.28909779063421404 L42.897358391631734,-9.741976874754783 L33.4444836356022,-13.00404802202834 L30.182399946305527,-3.5511775941064747 L39.635266046103084,-0.28908136279243024" />
<path d="M48.72611090510717,-1.494976703583757 L49.995981600485074,-11.413917057274546 L40.077042931660976,-12.683800913050774 L38.80714591548866,-2.764863929110823 L48.726081214527,-1.4949537525446521" />
<path d="M60.45029502103119,1.2121553952830717 L59.075728985858,-8.69281898822764 L49.170752778588806,-7.318266094925123 L50.54529253001823,2.586711936085116 L60.45027238475208,1.212185326531198" />
<path d="M-50.24643576859957,9.477430940389725 L-49.341126751380884,-0.4814025709406895 L-59.29995906154315,-1.3867248014880493 L-60.205294505417605,8.572106307488532 L-50.246464597644234,9.47745496468848" />
<path d="M-37.96031353286069,9.223036644081255 L-41.24456320959244,-0.22215543358145684 L-50.68975964477255,3.0620817113147645 L-47.40553503171765,12.507282503995643 L-37.960329881552624,9.223070422787833" />
<path d="M-27.79193890553258,7.848292610032935 L-28.70253120790566,-2.1100592105202036 L-38.66088423661925,-1.1994801208382952 L-37.75031835963012,8.758874116018223 L-27.79196291464821,7.848321451723468" />
<path d="M-23.330917107557987,10.09981725547049 L-19.959270953150416,0.6854690819364482 L-29.373614653193258,-2.686189563377268 L-32.745285789407184,6.728149663157803 L-23.330951036396453,10.099833290266005" />
<path d="M-8.597231310764265,8.875756296763575 L-7.460971151194861,-1.059376423680097 L-17.396102364045603,-2.1956497651318903 L-18.53238888737779,7.739479940108438 L-8.597260689765363,8.875779645319012" />
<path d="M-0.39152443968304596,8.369949651925884 L-2.784070739808032,-1.3395133365618603 L-12.49353690270545,1.0530200810950752 L-10.101016367520774,10.76248941838508 L-0.3915438558552804,8.36998176567687" />
<path d="M11.805206545863314,5.648093841095626 L7.950439250888133,-3.5789706121660303 L-1.2766303168509783,0.2757844403837306 L2.5781124932665795,9.502859122584052 L11.805192289911865,5.648128554905483" />
<path d="M22.767610762596796,9.384370124358046 L19.44322019056654,-0.04676854749125692 L10.012077107941089,3.277609511349414 L13.33644265358632,12.708757004734402 L22.76759455771425,9.38440397229046" />
<path d="M28.073829696013,11.488727973907942 L27.667327696751933,1.4970962028881623 L17.67569538639617,1.9035849453028257 L18.082170871963715,11.895217794976983 L28.07380426093868,11.48875556625727" />
<path d="M40.81066933659364,8.858505162070474 L38.77730312284473,-0.9324790154185045 L28.98631624750445,1.1008742077007798 L31.01965647999053,10.891863780875118 L40.81064875098169,8.858536539025824" />
<path d="M48.28335318466582,8.476725592077305 L46.43208733706194,-1.3503168971231263 L36.605042391620074,0.5009359120093029 L38.456282162277795,10.327983313675295 L48.28333202015062,8.476756581495543" />
<path d="M61.069099183971005,7.51236887582233 L60.20291977883354,-2.449944237163571 L50.240605516613996,-1.5837780499730663 L51.10675848585602,8.378537361462591 L61.06907504649024,7.512397610170575" />
<path d="M-52.3521775724961,19.447072845852468 L-48.27350149603288,10.316774341074995 L-57.40379458923571,6.238086150581903 L-61.48249489375149,15.36837383219402 L-52.352212623746155,19.447086250725313" />
<path d="M-44.695557939887685,22.138860588019945 L-40.89651175103503,12.888714784951798 L-50.14665251353994,9.089656323056243 L-53.94572324847171,18.33979204498163 L-44.69559256714212,22.13887505294292" />
<path d="M-31.350695884610836,19.85920639964653 L-29.12428575022971,10.110306603434644 L-38.87318259244341,7.883883534264982 L-41.09961859639769,17.632777422463334 L-31.350727662231844,19.859226361198314" />
<path d="M-18.676643658697678,20.53827808645673 L-16.147814243750865,10.863415239203812 L-25.82267373575731,8.334572987700573 L-28.351528823812522,18.009429124443503 L-18.67667604235013,20.538297049046243" />
<path d="M-10.430458821757108,24.20230715733701 L-6.477436300676238,15.016907750077543 L-15.662830463077514,11.063873041859095 L-19.615877358426577,20.24926195938601 L-10.430493685790193,24.20232104185873" />
<path d="M-3.1628965797867146,19.706463577250766 L-0.47020898380954446,10.075919312593351 L-10.10074967581432,7.383218938861569 L-12.793462827295977,17.01375605819675 L-3.1629292806473344,19.70648198742354" />
<path d="M10.28863065458726,15.042868752569326 L6.278122762077047,5.882422487939616 L-2.8823288236660054,9.892918226412947 L1.1281547607633846,19.053375133253216 L10.28861698868474,15.042903702874826" />
<path d="M22.868435173835838,17.614385346216938 L20.102914717649462,8.004502624275434 L10.493028326437992,10.770010330116019 L13.258523281927923,20.379900390580545 L22.868417011628583,17.614418185446134" />
<path d="M29.661604284451613,24.249981405821188 L33.541687467862445,15.033533944197618 L24.325245154321557,11.15343853245474 L20.445137514253467,20.369875697896727 L29.66156953158033,24.249995566283197" />
<path d="M37.00038449961981,19.27521166291432 L38.83764735339691,9.445541549976348 L29.00797967813857,7.608265654244726 L27.17069074045566,17.437930891806133 L37.00035354030283,19.275232871437098" />
<path d="M52.20961095857639,21.68992378530623 L52.345858400734656,11.690954540551026 L42.34688933676066,11.55469383181152 L42.21061536144015,21.553662714986697 L52.209584063798914,21.689949956887975" />
<path d="M58.25947102872713,20.053509069128935 L58.94456251227291,10.077106963047397 L48.96816131517604,9.392002242862816 L48.283043358353865,19.368402530957464 L58.25944273742873,20.053533724416017" />
<path d="M-51.409591594756094,30.329790715240765 L-48.02743209940961,20.91921440605052 L-57.43800392115961,17.537042424802387 L-60.820188388303436,26.94760975909558 L-51.40962554148362,30.329806712129148" />
<path d="M-36.78170325890691,32.19713534008611 L-38.0990505022437,22.284388719520848 L-48.011798870649834,23.601722810674847 L-46.69447793168094,33.51447292690441 L-36.7817260676454,32.19716514012294" />
<path d="M-31.88510993997008,32.300203083994724 L-29.166940696010037,22.676819828390446 L-38.790320345152765,19.95863781617701 L-41.50851512561482,29.582013858841243 L-31.885142689446017,32.30022140754713" />
<path d="M-25.073606321767592,29.951872032015405 L-20.82261389779206,20.900517838616622 L-29.873962450987822,16.649513405354277 L-34.124978893529494,25.700856318331084 L-25.073641620787573,29.9518847701446" />
<path d="M-13.361179681737555,30.945712419351203 L-10.356467941706597,21.407910959770735 L-19.894265414642465,18.403186565036123 L-22.899002464075448,27.940980051310603 L-13.36121296447915,30.945729755436663" />
<path d="M3.240805035060984,25.694970927680977 L-0.4721950945533111,16.40995184528059 L-9.757219103335151,20.122939655575685 L-6.044243612365792,29.407968590722632 L3.2407902491299114,25.69500541908554" />
<path d="M10.790573581442104,32.695689151541394 L10.929670599996347,22.69675914166111 L0.9307407746780845,22.557648856577682 L0.7916172230657113,32.55657849731633 L10.79054667920714,32.6957153154574" />
<path d="M16.59023891673982,28.492874221247956 L17.289415294850155,18.5174493322847 L7.3139913335593265,17.818259718832145 L6.614788484765784,27.79368275243298 L16.59021059065907,28.49289883656628" />
<path d="M25.850091275002733,28.508584598427493 L29.297987059001375,19.12189380849841 L19.911300843731084,15.673985570289366 L16.463380151317708,25.06066721088434 L25.85005721722084,28.508600357495983" />
<path d="M43.80897191806094,30.788835255331612 L41.89414119688982,20.973980562962783 L32.079283963942,22.888798261833106 L33.994088640508146,32.703658035342634 L43.808950954562114,30.788866381084027" />
<path d="M48.50131881283751,32.325551285843616 L48.686087277593174,22.327360952713974 L38.6878971896222,22.14257922241056 L38.50310219377136,32.14076906520528 L48.50129179137223,32.32557732660322" />
<path d="M62.12226109401942,29.698837036835076 L63.0503306715663,19.742098806757568 L53.09359367285553,18.81401601866203 L52.16549767421298,28.77075178598853 L62.122232210137675,29.698860995176474" />
<path d="M-50.83766377619863,42.107080231858276 L-46.33127872717803,33.180129850399965 L-55.258223129579804,28.67373295715112 L-59.764631867048955,37.600671380480634 L-50.83769942280741,42.10709196216216" />
<path d="M-44.07568154160499,38.3971885795293 L-40.73855895726908,28.97054749268557 L-50.1651956164273,25.633412401133306 L-53.50234321519003,35.06004463258942 L-44.07571541145263,38.39720473855675" />
<path d="M-30.517239613775367,40.43933767100262 L-33.584171752158255,30.92136117483447 L-43.10215231750793,33.98828068481202 L-40.03524543594113,43.506265319326474 L-30.517256732278646,40.4393710661758" />
<path d="M-22.954979071331536,39.09140512092678 L-18.45186639591141,30.16280357558242 L-27.380461966540995,25.659679053743293 L-31.883598334791195,34.58826864964239 L-22.955014713638313,39.09141686429573" />
<path d="M-9.41891675312968,40.41875041431992 L-6.440299781068754,30.87276747609896 L-15.986278767267518,27.89413783847921 L-18.964921070440848,37.44011287263897 L-9.418949988336896,40.41876784136063" />
<path d="M-4.161590469393531,40.21594305907277 L0.559751102979213,31.40079229008192 L-8.255393401751975,26.67943902181628 L-12.976758365902205,35.49457726227228 L-4.161626389736895,40.21595392229879" />
<path d="M10.21121265428821,42.071617443008535 L14.12552791284303,32.86965649005399 L4.923572153390099,28.955329022387787 L1.0092324766193963,38.15727958832291 L10.21117784902051,42.07163147419122" />
<path d="M18.55630026033303,39.34432628896722 L17.956805903887236,29.36241485679929 L7.97489367632204,29.96189596929541 L8.574361544867426,39.94380899224029 L18.55627536317444,39.344354367646694" />
<path d="M34.31832089457123,36.233791729241055 L31.347396714784626,26.68541182262787 L21.799012866372802,29.656323333670123 L24.769911708665486,39.204711123863774 L34.31830344062415,36.23382495032589" />
<path d="M44.859234147899336,39.20697447731243 L40.38689991275729,30.262916348463335 L31.442835850045835,34.735238716670764 L35.915146351310774,43.67930871322885 L44.85922228169973,39.207010078914266" />
<path d="M49.256777818218744,41.187630820655166 L47.140085750128215,31.414322066096258 L37.36677418716168,33.531001167008746 L39.4834403208924,43.304315538365714 L49.256757500622584,41.18766237182049" />
<path d="M58.275667842937054,38.25468027236332 L58.20193191225475,28.25505466259265 L48.20230620466042,28.328777325822664 L48.27601560043801,38.32840313122304 L58.27564150362684,38.25470700289801" />
<path d="M-53.18047651529454,47.77504954692451 L-53.55118766941076,37.78202585220283 L-63.54421185598131,38.152723747625885 L-63.17352721925207,48.145748426027716 L-53.180502049036555,47.77507704799295" />
<path d="M-41.41294157747009,45.79349795170366 L-43.26649412805409,35.96688651818911 L-53.09310802084404,37.82042603087358 L-51.239581546062375,47.647042382921654 L-41.41296273477348,45.79352894604604" />
<path d="M-32.26360003953363,53.132808757683456 L-30.813104647572803,43.238668449351685 L-40.707243031385985,41.78815992989727 L-42.15776467833144,51.68229638917444 L-32.26363014360769,53.13283116360838" />
<path d="M-21.24857733812722,47.401060501019 L-24.74347342984925,38.03176753777353 L-34.11277103009678,41.526651198362394 L-30.617899800647226,50.89595343559548 L-21.248592926445077,47.40109463729139" />
<path d="M-9.993196249563299,52.00086449020914 L-6.148088084862568,42.769770702103266 L-15.379176771290428,38.92465028963779 L-19.22430943151388,48.15573387437138 L-9.9932309484908,52.00087878234604" />
<path d="M3.059459034039297,46.420779415606276 L0.9029295988738477,36.656183667787516 L-8.86166901020856,38.81270014733526 L-6.705165486282317,48.57730161766413 L3.059438845275835,46.42081104936319" />
<path d="M13.367154995693951,53.575191935825806 L13.507177521382856,43.57627484369254 L3.5082606150395605,43.43623905149159 L3.3682115563268082,53.43515577202731 L13.367128091037348,53.575218097251614" />
<path d="M21.04052630376069,53.453451864887796 L21.743791544055107,43.47831440594274 L11.768655018207566,42.77503593068747 L11.065363307992675,52.75017152341994 L21.040497967592515,53.45347646859321" />
<path d="M27.65425164867093,51.361847141062604 L31.118896962253306,41.98132571770203 L21.7383801357747,38.51666795809475 L18.27370993014862,47.89718018767486 L27.654217562813727,51.361862839313645" />
<path d="M41.25007228661116,45.66232823752794 L36.71526775632355,36.749781059804945 L27.80271456185289,41.28457376496645 L32.33749544188031,50.19713297616908 L41.250060669799225,45.66236392128373" />
<path d="M52.986716960170874,47.068776868890325 L51.940416549972056,37.12376783044933 L41.99540612331378,38.17005504565999 L43.041680143534435,48.115066860518056 L52.986693346574775,47.068806035289235" />
<path d="M59.32655366185392,44.858828362811266 L54.29822718892211,36.21510912832266 L45.65450128288321,41.243424132807476 L50.68280481891218,49.887156710381454 L59.32654406800602,44.858864642817174" />
<path d="M-46.16011694295546,59.8285600848027 L-49.714611212581055,50.48171463779471 L-59.06146137566567,54.03619650607034 L-55.50699190874627,63.38305138521513 L-46.16013231355775,59.82859431965821" />
<path d="M-41.25173220385054,59.10975324694059 L-36.720864518602895,50.19520405240283 L-45.635407701600684,45.6643245393808 L-50.16629904238913,54.57886171082293 L-41.25176788251835,59.10976487936981" />
<path d="M-27.91135296474256,60.73993741978442 L-27.364285997393022,50.755015434296 L-37.349207257027,50.20793521900341 L-37.89630072026168,60.192855752765354 L-27.91138091238933,60.73996246394116" />
<path d="M-21.08883175171522,58.30698184449452 L-18.372637047669286,48.68304108964945 L-27.99657419867262,45.96683361661043 L-30.712794440699945,55.59076716375508 L-21.088864497430922,58.307000174765925" />
<path d="M-14.546925447215283,63.25447020964938 L-12.401331395821884,53.48746578200216 L-22.16833297669727,51.34185877179902 L-24.313952945706376,61.108857505885375 L-14.546957058426237,63.25449043369667" />
<path d="M-1.877561104397845,57.60554219961185 L-2.730718042755588,47.64210541496835 L-12.694155959354585,48.495249133888265 L-11.841025459874007,58.45868818242523 L-1.8775852794166186,57.60557090238548" />
<path d="M14.190637578722981,60.97790963004257 L10.028559441122358,51.8853272191121 L0.9359715079758644,56.04739329271712 L5.098025517577955,65.13998674806363 L14.190624495108487,60.977944802471804" />
<path d="M15.407933145760872,58.74121743570667 L17.501015788572865,48.96282521748315 L7.722626347449319,46.86972960075211 L5.629517756802912,56.648116264758464 L15.407901643674867,58.74123782931655" />
<path d="M27.641089741222928,57.19917903684005 L26.34124327348286,47.284122381260175 L16.426184893282056,48.58395569375253 L17.726005050524403,58.49901579855681 L27.641066879914717,57.19920879656675" />
<path d="M38.31798181072264,62.181984586224196 L43.46707296518299,53.60965645402386 L34.89475166477807,48.460553925846824 L29.745637762893413,57.032868394441245 L38.31794539966224,62.18199367002446" />
<path d="M50.31109627344028,61.720657196429066 L53.40159012965021,52.21030509738543 L43.89124213106641,49.11979862289159 L40.8007230382941,58.63014252099879 L50.31106283590798,61.720674232044125" />
<path d="M56.81096302511841,64.34066014324657 L60.435552869667035,55.0207729238726 L51.11567045938858,51.39617071374837 L47.491055883695125,60.71604831491491 L56.81092867573334,64.34067525617121" />
<path d="M-50.96487017254155,69.07159352253571 L-56.309825899668596,60.62000209854301 L-64.76142441531384,65.96494661213697 L-59.41649111526236,74.41655221941988 L-50.96487841635666,69.07163013292029" />
<path d="M-40.50719076075693,68.8423465413935 L-44.34527219507368,59.608328985280224 L-53.57929484352569,63.446398167946235 L-49.74123791251718,72.68042590872072 L-40.507205079436474,68.84238122937647" />
<path d="M-29.53034881956114,69.26392235022459 L-24.982161687225258,60.35819704948073 L-33.88788095344981,55.80999810107797 L-38.43609171791144,64.71571133276754 L-29.530384520772543,69.26393391328001" />
<path d="M-23.145228928108583,70.37750713481482 L-24.05364116779316,60.418956209813416 L-34.01219329806253,61.32735523654265 L-33.103807484290236,71.28590857206244 L-23.14525294353755,70.3775359712487" />
<path d="M-11.540252593133513,65.05786246945344 L-15.953149849342987,56.084329880266836 L-24.92668829353125,60.49721523043518 L-20.513814849411798,69.47075952960931 L-11.540264695267606,65.05789799154653" />
<path d="M-3.0983746699899637,70.44605383990259 L-1.20098785963473,60.62781201078955 L-11.019227171295999,58.73041217364283 L-12.91664003523086,68.54864896783504 L-3.0984057585249976,70.4460748585547" />
<path d="M15.187748785060673,65.96990400598649 L12.323995947820915,56.3888357455963 L2.7429238878265068,59.25257587072107 L5.606651300831237,68.83365173030283 L15.187730959983481,65.96993702942318" />
<path d="M24.60430110292888,72.82134939121602 L22.581403064677115,63.02819704634299 L12.788248035841901,65.05108209108842 L14.811120087077441,74.84423980386856 L24.60428048378528,72.82138074614647" />
<path d="M25.16081772721214,73.54640272615903 L27.519122888108434,63.82856623385556 L17.80128952480077,61.47024817938548 L15.44295857676105,71.18807841368023 L25.160785682025786,73.54642225528544" />
<path d="M40.704885634152745,68.08933578957655 L39.99191470875088,58.11488734454529 L30.01746531776221,58.82784503589923 L30.730409775066974,68.80229537282776 L40.70486105791779,68.08936414957341" />
<path d="M52.40189331753321,69.7422419809502 L48.89348916606416,60.37799883838871 L39.529241368578184,63.88639056542466 L43.037620671174864,73.25064301781863 L52.40187777846034,69.7422761396674" />
<path d="M61.64626747246583,63.550899306851015 L56.664538317455765,54.88024051128203 L47.99387291216159,59.86195816210187 L52.97557905878244,68.53263017710597 L61.646257683481195,63.55093553469288" />
<path d="M-46.09750088855164,78.84403670629118 L-46.5895211421967,68.85625089595973 L-56.57730760532932,69.34825789786112 L-56.08531385517247,79.33604501377732 L-46.09752608649026,78.84406451536589" />
<path d="M-37.35641263770574,80.5825086361341 L-39.80383831057335,70.88673342737209 L-49.499616766558695,73.33414623593248 L-47.05221682230984,83.02992793912412 L-37.356431871928976,80.5825408591911" />
<path d="M-32.090303585316896,79.9063113518321 L-31.539492639177283,69.92159520040006 L-41.524208059787384,69.3707710065905 L-42.075045501265926,79.35548569636109 L-32.09033154235243,79.90633638550763" />
<path d="M-25.468340930812875,80.39371858843754 L-22.735518825056985,70.77448615225288 L-32.354747635338754,68.04165128375088 L-35.087595266582035,77.66087646811283 L-25.468373708156847,80.3937368620926" />
<path d="M-8.658216602183892,77.40931345378435 L-7.845616092432351,67.44248699123335 L-17.812441476820357,66.62987325754806 L-18.625068434438006,76.59669756375553 L-8.658245206428605,77.40933774530409" />
<path d="M0.06283236963947081,82.06715364510926 L4.036008783839073,72.89045388404575 L-5.140685705626174,68.91726529425124 L-9.113886471008609,78.09395451210206 L0.06279747521164003,82.06716745306542" />
<path d="M14.877232276411112,80.32777267913141 L9.317249374164948,72.01606169704202 L1.0055310151259294,77.57603357134758 L6.565491861481094,85.88775930732159 L14.877224974375464,80.3278094889266" />
<path d="M19.121616102554082,76.13160301894138 L16.095409049014027,66.60059991369796 L6.564401928622932,69.62679432154908 L9.590583690779768,79.15780545707108 L19.121598841415903,76.13163634061384" />
<path d="M27.679884461205685,76.49146117765704 L25.582352992784738,66.7140222971506 L15.804911329292857,68.81154079291369 L17.902416852394406,78.58898523937381 L27.679864081805533,76.4914926889375" />
<path d="M41.86306583002606,81.86277572254966 L44.6015966904496,72.24516697351214 L34.983991574889316,69.5066233524968 L32.24543519328701,79.12422483456292 L41.86303304184203,81.86279397674738" />
<path d="M50.11421056625046,80.30876480186146 L50.598645663931265,70.32060821722632 L40.61048972205094,69.83615986731077 L40.12602811990148,79.8243151664187 L50.11418277621944,80.30879002080121" />
<path d="M59.14785889671382,82.87724510181306 L61.5576947956895,73.17205884584335 L51.85251173708632,70.7622100700782 L49.44265008453593,80.46738993129779 L59.14782674835483,82.87726446062916" />
<path d="M-45.45230217670333,91.79717449822195 L-46.791567596391275,81.88736534472746 L-56.70137852680757,83.22661761613001 L-55.36213940369276,93.13643032345067 L-45.45232491948519,91.79720434862595" />
<path d="M-34.87999792286725,88.13510319941291 L-41.125853730835566,80.32567079051637 L-48.93529442669484,86.57151623696413 L-42.68945934177864,94.38096521977239 L-34.88000207203514,88.13514049639875" />
<path d="M-28.755643389847787,84.05398977026489 L-34.81371917960347,76.09800667270673 L-42.769710314978674,82.1560719064993 L-36.711655637159865,90.11207107967755 L-28.7556484261926,84.05402695784316" />
<path d="M-19.82790001274616,83.03013812134895 L-24.789734892844244,74.34807938244968 L-33.47180021507307,79.30990274323217 L-28.509988373614906,87.99197464877527 L-19.827909884772822,83.03017432665017" />
<path d="M-8.308970452808417,91.20867745034722 L-6.779697874247343,81.32640672426831 L-16.661966571286506,79.79712103396221 L-18.19126537333493,89.67938770194422 L-8.309000734427457,91.20869961573229" />
<path d="M3.322812229068682,89.99742542136163 L-2.4498631323372138,81.83200023372612 L-10.615295979121747,87.60466476128246 L-4.8426422854251125,95.77010526720164 L3.3228058796143447,89.99746240737488" />
<path d="M9.807053510997232,86.19849303629475 L6.798925297964959,76.6617685474037 L-2.7378031820868665,79.66988410715594 L0.270299724380056,89.20661657835174 L9.80703618670305,86.19852632517544" />
<path d="M19.446833380275976,90.81902554752563 L18.090749059561027,80.91150392816157 L8.183225640959938,82.26757510362619 L9.53928367117186,92.17510032144688 L19.446810688194702,90.81905543649005" />
<path d="M29.10171025776338,88.71177738475956 L22.79020231326766,80.95530803397743 L15.03372458841583,87.26680568722364 L21.345211950401424,95.02329178613152 L29.1017064233517,88.71181471541765" />
<path d="M40.162248762300976,89.46901762429502 L34.43849391006048,81.26922618094059 L26.238694872464542,86.9929701537346 L31.962427965802004,95.19277678555761 L40.162242191837656,89.46905457168678" />
<path d="M50.89728500547778,88.72171322996812 L48.63987168940843,78.97994570844062 L38.898101172765045,81.23734609918056 L41.155488638171576,90.97911961092268 L50.8972651449953,88.72174507085352" />
<path d="M55.862813180095614,90.66891438205482 L58.614083778345915,81.05494228836439 L49.00011533503572,78.3036589343474 L46.248819225256895,87.9176237272604 L55.86278036775581,90.66893259279624" />
<path d="M-50.70528484397037,96.04965775715351 L-50.99070070304685,86.05383427379594 L-60.98652456508393,86.33923687046459 L-60.701135230823624,96.33506111116358 L-50.705310611480314,96.04968503931211" />
<path d="M-42.31544725500711,94.98731383973777 L-43.74381841266602,85.08995534572416 L-53.64117880182648,86.51831337161707 L-52.21283390770208,96.41567565590697 L-42.31546972830019,94.98734389355357" />
<path d="M-29.377877103036354,98.84470524051173 L-35.968821705783206,91.32423992468337 L-43.489295766436605,97.91517454930941 L-36.898371119942965,105.43565735477462 L-29.377879569679195,98.844742686425" />
<path d="M-17.11311382617765,97.45992828142782 L-23.716645288841505,89.95051275525911 L-31.22606957653545,96.55403425446295 L-24.622558040803238,104.06346730366892 L-17.113116230098484,97.45996573141996" />
<path d="M-5.584398084020825,99.64712533502647 L-12.161213523437333,92.11430063250566 L-19.69404695203668,98.691106077403 L-13.117251501670074,106.22394823206763 L-5.5844006209534465,99.64716277624333" />
<path d="M5.616940374216903,98.25990786998281 L-0.732554690268041,90.53450420744605 L-8.457966777275654,96.88398902189924 L-2.1084922128653734,104.6094095333641 L5.616936723043137,98.25994521900792" />
<path d="M8.505767561787277,96.14499692966167 L3.4814720963295036,87.49893398755701 L-5.164597511977135,92.52321798145817 L-0.14032498964142115,101.16929425595157 L8.505757951023497,96.14503320519015" />
<path d="M18.951530389701052,103.24007864916992 L21.873475402442487,93.67659671707162 L12.309997347174338,90.75463901555374 L9.388026956885147,100.3181131939749 L18.951497258442494,103.24009627303025" />
<path d="M26.979281921655755,103.23117333340332 L27.10948613184541,93.23212356921552 L17.110436540420697,93.10190609233777 L16.980205796855103,103.1009555109818 L26.979255042700846,103.23119952123503" />
<path d="M40.94066616850447,104.3441649888936 L41.593812146553795,94.36562052317049 L31.615268547430226,93.71246130563988 L30.962096090419458,103.69100403814635 L40.94063795629126,104.3441897346361" />
<path d="M51.79015463576991,98.2902602793217 L55.01272399457626,88.82384617202968 L45.54631416298114,85.60126425323624 L42.32371968420628,95.06766980911777 L51.79012096435758,98.29027684786838" />
<path d="M56.17230069515767,100.1057885162437 L57.70316570486002,90.22376434884136 L47.821143568610246,88.69288622772112 L46.29025233607482,98.57490633280092 L56.17227040996725,100.10581067674883" />
<path d="M-46.95832606781953,109.42115994986116 L-53.61111781198892,101.9553504604778 L-61.076936128255845,108.60813229904342 L-54.42416419530573,116.07395944218077 L-46.95832822531111,109.42119741485786" />
<path d="M-42.09376074869195,101.35050613169413 L-48.090818252875096,93.34842823813575 L-56.09290410329172,99.34547512519751 L-50.09586783336191,107.3475689324583 L-42.09376606927105,101.35054327967099" />
<path d="M-25.9881857840509,107.5673759259595 L-28.03196595203444,97.77856026991549 L-37.82078431974693,99.82232745014657 L-35.77703012727189,109.6111485295103 L-25.98820633627414,107.56740732479493" />
<path d="M-22.48147829932353,107.97263271975996 L-17.827972819038294,99.12148386393281 L-26.679115500610337,94.46796663999253 L-31.33264446819745,103.31910314729389 L-22.481514135182373,107.97264385851965" />
<path d="M-7.192818204270218,108.29373788021996 L-12.976422559149865,100.1360500957266 L-21.134118017292792,105.91964362702264 L-15.350535309590551,114.07734675880077 L-7.19282450419158,108.29377487470249" />
<path d="M1.7085899981778931,114.00857025122454 L1.3348403745573467,104.01565973717015 L-8.658070635377332,104.3893961022477 L-8.284347528843446,114.38230760804508 L1.7085644727990568,114.00859776005551" />
<path d="M10.033732568892207,106.86262772843693 L8.168216421448557,97.0382804815873 L-1.6561333005497412,98.9037835941355 L0.20935677709975709,108.72813579126517 L10.033711449343487,106.86265874851767" />
<path d="M15.57191030090365,107.8769481671428 L21.211164953018113,99.61881543076343 L12.953039698780312,93.97954982180556 L7.313763132988962,102.23766759388727 L15.571873422900028,107.87695511650216" />
<path d="M31.601717590577508,105.84610816598538 L30.735448204081937,95.88380287689311 L20.773141765636638,96.7500590454521 L21.63938471625753,106.7123666332329 L31.601693453356283,105.84613690055164" />
<path d="M41.78287949539453,103.49849533530632 L37.565590031938314,94.43138891128788 L28.498478012449723,98.64866634454985 L32.715743415510026,107.71578395949264 L41.782866625891046,103.49853058664142" />
<path d="M55.51198344089117,106.82679477965826 L49.65109450859663,98.72445374472505 L41.54874569747303,104.5853319268697 L47.40961312945747,112.68768851416942 L55.51197749291906,106.82683183235551" />
<path d="M59.00345474975128,107.72734799580972 L58.8692612423792,97.72835097482636 L48.87026404335737,97.86253121558009 L49.004431017492536,107.86152859262278 L59.00342857253848,107.72737488510641" />
</g>
</svg>
<p>I wanted to see if I could also translate that to R – base <code>plot</code> can draw line
segments just fine, and I was curious about colouring the squares my own way.</p>
<p>Most of that code translates straightforwardly, with the exception that the
‘randomness’ is actually a sequence of values, starting with a specific seed. I
<a href="https://fosstodon.org/@jonocarroll/116417076133600830" rel="nofollow" target="_blank">tooted</a> recently about
an older <a href="https://jcarroll.com.au/2016/05/30/seed/" rel="nofollow" target="_blank">post of mine</a> which (ab)uses
the <code>set.seed()</code> function to generate specific ‘random’ words</p>
<pre>printStr &lt;- function(str) paste(str, collapse=&quot;&quot;)

set.seed(2505587); x &lt;- sample(LETTERS, 5, replace=TRUE)
set.seed(11135560);y &lt;- sample(LETTERS, 5, replace=TRUE)

paste(printStr(x), printStr(y))
## [1] &quot;HELLO WORLD&quot;</pre>
<p>which I was inspired to revisit based on
<a href="https://www.andrewheiss.com/blog/2026/04/13/seeds-predetermined-universes/" rel="nofollow" target="_blank">a post by Andrew Heiss</a>.</p>
<p>The <code>Random</code> class in that Python translation produces an iterator which
returns a ‘next’ value each time it’s called with a specific ‘seed’ and
two values</p>
<pre>class Random:
    def __init__(self, seed):
        self.JI = seed

    def next(self, JA, JE):
        self.JI = (self.JI * 5) % 2147483648
        return self.JI / 2147483648 * (JE-JA) + JA
      
r = Random(1)
r.next(2, 3)
## 2.0000000023283064
r.next(2, 3)
## 2.000000011641532
r.next(2, 3)
## 2.000000058207661</pre>
<p>with the added complexity that the subsequent calls <em>update the seed itself</em>.</p>
<p>When I first saw this, my mind went back to reading the
<a href="https://stat.auckland.ac.nz/~ihaka/downloads/R-paper.pdf" rel="nofollow" target="_blank">“original” R paper</a>
‘R: A Language for Data Analysis and Graphics’ by Ross Ihaka and Robert Gentleman,
in which
<a href="https://fosstodon.org/@jonocarroll/111077962097245990" rel="nofollow" target="_blank">I recalled seeing the cool example of an OO system</a>
maintaining a (non-global) state via <code>&lt;&lt;-</code></p>
<div class="float">
<img src="https://i2.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/images/state.png?w=578&#038;ssl=1" alt="Maintaining the state of the total balance internal to the function" data-recalc-dims="1" />
<div class="figcaption">Maintaining the state of the total balance internal to the function</div>
</div>
<p>With this same trick we can write an equivalent of the <code>Random</code> class which also
updates the seed internally</p>
<pre>random &lt;- function(seed) {
  list(
    nextval = function(a, b) { 
      seed &lt;&lt;- (seed * 5) %% 2147483648
      seed / 2147483648 * (b-a) + a
    }
  )
}

r &lt;- random(1)
print(r$nextval(2, 3), digits = 16)
## [1] 2.000000002328306
print(r$nextval(2, 3), digits = 16)
## [1] 2.000000011641532
print(r$nextval(2, 3), digits = 16)
## [1] 2.000000058207661</pre>
<p>Cool!</p>
<p>The rest of the translation is mostly aligning to base plot syntax.</p>
<p>This is what I ended up with</p>
<pre>draw_square &lt;- function(x, y, i, r1, r2, col) {
  r = 5 * 1.4142
  move_limit = 5 * i / 264
  twist_limit = pi/4 * i / 264
  
  y_center = y + 5 + r1$nextval(-move_limit, move_limit)
  x_center = x + 5 + r1$nextval(-move_limit, move_limit)
  angle = r2$nextval(pi/4 - twist_limit, pi/4 + twist_limit)
  
  x0 &lt;- x_center + r * sin(angle)
  y0 &lt;- y_center + r * cos(angle)
  
  for (step in 1:4) {
    angle &lt;- angle + pi / 2
    x1 &lt;- x_center + r * sin(angle)
    y1 &lt;- y_center + r * cos(angle)
    segments(x0, y0, x1, y1, lwd = 1.75, col = col)
    x0 &lt;- x1
    y0 &lt;- y1
  }
}

draw_plot &lt;- function(x_size, y_size, x_count, y_count, s1, s2) {
  r1 = random(s1)
  r2 = random(s2)
  
  plot(NULL, NULL, xlim = c(-60, 60), ylim = c(120, -120), axes = FALSE, ann = FALSE)
  
  y = -y_size * y_count * 0.5
  x0 = -x_size * x_count * 0.5
  i = 0
  
  for (z in 1:y_count) {
    x = x0
    for (zz in 1:x_count) {
      draw_square(x, y, i, r1, r2, &quot;black&quot;)
      x &lt;- x + x_size
      i &lt;- i + 1
    }
    y &lt;- y + y_size
  }
}

draw_plot(10.0, 10.0, 12, 22, 1922110153, 1769133315)</pre>
<div class="figure"><span style="display:block;" id="fig:unnamed-chunk-5"></span>
<img src="https://i1.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/unnamed-chunk-5-1.png?w=450&#038;ssl=1" alt="'Schotter' in R" data-recalc-dims="1" />
<p class="caption">
Figure 1: ‘Schotter’ in R
</p>
</div>
<p>Which uses the special seeds discovered in that original post. Checking the
rotations, this does indeed appear to match the original art.</p>
<p>Why stop there, though? Now that I can plot it, I can change things… what if I
used a different set of seeds, e.g. swapped them?</p>
<pre>draw_plot(10.0, 10.0, 12, 22, 1769133315, 1922110153)</pre>
<div class="figure"><span style="display:block;" id="fig:unnamed-chunk-6"></span>
<img src="https://i1.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/unnamed-chunk-6-1.png?w=450&#038;ssl=1" alt="'Schotter' in R with swapped seeds" data-recalc-dims="1" />
<p class="caption">
Figure 2: ‘Schotter’ in R with swapped seeds
</p>
</div>
<p>or completely different values?</p>
<pre>draw_plot(10.0, 10.0, 12, 22, 12345, 67890)</pre>
<div class="figure"><span style="display:block;" id="fig:unnamed-chunk-7"></span>
<img src="https://i2.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/unnamed-chunk-7-1.png?w=450&#038;ssl=1" alt="'Schotter' in R with new seeds" data-recalc-dims="1" />
<p class="caption">
Figure 3: ‘Schotter’ in R with new seeds
</p>
</div>
<p>What about changing the colours? I could plot the colour as a function of the
progression down the grid, which I think looks pretty cool.</p>
<pre>draw_plot &lt;- function(x_size, y_size, x_count, y_count, s1, s2) {
  r1 = random(s1)
  r2 = random(s2)
  
  plot(NULL, NULL, xlim = c(-60, 60), ylim = c(120, -120), axes = FALSE, ann = FALSE)
  
  y = -y_size * y_count * 0.5
  x0 = -x_size * x_count * 0.5
  i = 0
  
  for (z in 1:y_count) {
    x = x0
    rcol &lt;- scales::viridis_pal(option = &quot;viridis&quot;)(y_count)[z]
    for (zz in 1:x_count) {
      draw_square(x, y, i, r1, r2, rcol)
      x &lt;- x + x_size
      i &lt;- i + 1
    }
    y &lt;- y + y_size
  }
}

draw_plot(10.0, 10.0, 12, 22, 1922110153, 1769133315)</pre>
<div class="figure"><span style="display:block;" id="fig:unnamed-chunk-8"></span>
<img src="https://i0.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/unnamed-chunk-8-1.png?w=450&#038;ssl=1" alt="'Schotter' in R with viridis colours" data-recalc-dims="1" />
<p class="caption">
Figure 4: ‘Schotter’ in R with viridis colours
</p>
</div>
<p>Ever since I first drafted this post, I’ve seen other examples of similar work.
<a href="https://mastodon.social/@safest_integer/114296256313964335" rel="nofollow" target="_blank">This toot</a>
demonstrated a simplified version</p>
<pre>suppressPackageStartupMessages(library(tidyverse))
crossing(x=0:10, y=x) |&gt;  
  mutate(dx = rnorm(n(), 0, (y/20)^1.5),  
         dy = rnorm(n(), 0, (y/20)^1.5)) |&gt;  
  ggplot() +  
  geom_tile(aes(x=x+dx, y=y+dy, fill=y), colour=&#39;black&#39;,  
            lwd=2, width=1, height=1, alpha=0.8, show.legend=FALSE) +  
  scale_fill_gradient(high=&#39;#9f025e&#39;, low=&#39;#f9c929&#39;) +  
  scale_y_reverse() + theme_void()</pre>
<div class="figure"><span style="display:block;" id="fig:unnamed-chunk-9"></span>
<img src="https://i1.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/unnamed-chunk-9-1.png?w=450&#038;ssl=1" alt="https://mastodon.social/@safest_integer/114296256313964335" data-recalc-dims="1" />
<p class="caption">
Figure 5: <a href="https://mastodon.social/@safest_integer/114296256313964335" class="uri" rel="nofollow" target="_blank">https://mastodon.social/@safest_integer/114296256313964335</a>
</p>
</div>
<p>while <a href="https://fosstodon.org/deck/@arclight@oldbytes.space/116380342929564058" rel="nofollow" target="_blank">this one</a>
showed a book ‘Crisis Engineering’ with a similar idea</p>
<div class="float">
<img src="https://i1.wp.com/jcarroll.com.au/2026/04/17/schotter-plots-in-r/images/crisis.jpg?w=578&#038;ssl=1" alt="Cover of ‘Crisis Engineering’" data-recalc-dims="1" />
<div class="figcaption">Cover of ‘Crisis Engineering’</div>
</div>
<p>I’m sure I’ve seen others around, too.</p>
<p>This was a fun exploration of some artistically inspired code translation, and I got
to stretch my ‘maintaining internal state’ muscles a little. I have no doubt that
someone more artistic than I could do a lot more with it.</p>
<p>As always, I can be found on
<a href="https://fosstodon.org/@jonocarroll" rel="nofollow" target="_blank">Mastodon</a> and the comment section below.</p>
<br />
<details>
<summary>
<tt>devtools::session_info()</tt>
</summary>
<pre>## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.5.3 (2026-03-11)
##  os       macOS Tahoe 26.3.1
##  system   aarch64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Australia/Adelaide
##  date     2026-04-17
##  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
##  quarto   1.7.31 @ /usr/local/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version date (UTC) lib source
##  blogdown       1.23    2026-01-18 [1] CRAN (R 4.5.2)
##  bookdown       0.46    2025-12-05 [1] CRAN (R 4.5.2)
##  bslib          0.10.0  2026-01-26 [1] CRAN (R 4.5.2)
##  cachem         1.1.0   2024-05-16 [1] CRAN (R 4.5.0)
##  cli            3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
##  devtools       2.4.6   2025-10-03 [1] CRAN (R 4.5.0)
##  digest         0.6.39  2025-11-19 [1] CRAN (R 4.5.2)
##  dplyr        * 1.2.0   2026-02-03 [1] CRAN (R 4.5.2)
##  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.5.0)
##  evaluate       1.0.5   2025-08-27 [1] CRAN (R 4.5.0)
##  farver         2.1.2   2024-05-13 [1] CRAN (R 4.5.0)
##  fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
##  forcats      * 1.0.1   2025-09-25 [1] CRAN (R 4.5.0)
##  fs             1.6.7   2026-03-06 [1] CRAN (R 4.5.2)
##  generics       0.1.4   2025-05-09 [1] CRAN (R 4.5.0)
##  ggplot2      * 4.0.2   2026-02-03 [1] CRAN (R 4.5.2)
##  glue           1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
##  gtable         0.3.6   2024-10-25 [1] CRAN (R 4.5.0)
##  hms            1.1.4   2025-10-17 [1] CRAN (R 4.5.0)
##  htmltools      0.5.9   2025-12-04 [1] CRAN (R 4.5.2)
##  jquerylib      0.1.4   2021-04-26 [1] CRAN (R 4.5.0)
##  jsonlite       2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
##  knitr          1.51    2025-12-20 [1] CRAN (R 4.5.2)
##  labeling       0.4.3   2023-08-29 [1] CRAN (R 4.5.0)
##  lattice        0.22-9  2026-02-09 [1] CRAN (R 4.5.3)
##  lifecycle      1.0.5   2026-01-08 [1] CRAN (R 4.5.2)
##  lubridate    * 1.9.5   2026-02-04 [1] CRAN (R 4.5.2)
##  magrittr       2.0.4   2025-09-12 [1] CRAN (R 4.5.0)
##  Matrix         1.7-4   2025-08-28 [1] CRAN (R 4.5.3)
##  memoise        2.0.1   2021-11-26 [1] CRAN (R 4.5.0)
##  otel           0.2.0   2025-08-29 [1] CRAN (R 4.5.0)
##  pillar         1.11.1  2025-09-17 [1] CRAN (R 4.5.0)
##  pkgbuild       1.4.8   2025-05-26 [1] CRAN (R 4.5.0)
##  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.5.0)
##  pkgload        1.5.0   2026-02-03 [1] CRAN (R 4.5.2)
##  png            0.1-9   2026-03-15 [1] CRAN (R 4.5.2)
##  purrr        * 1.2.1   2026-01-09 [1] CRAN (R 4.5.2)
##  R6             2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
##  RColorBrewer   1.1-3   2022-04-03 [1] CRAN (R 4.5.0)
##  Rcpp           1.1.1   2026-01-10 [1] CRAN (R 4.5.2)
##  readr        * 2.2.0   2026-02-19 [1] CRAN (R 4.5.2)
##  remotes        2.5.0   2024-03-17 [1] CRAN (R 4.5.0)
##  reticulate     1.45.0  2026-02-13 [1] CRAN (R 4.5.2)
##  rlang          1.1.7   2026-01-09 [1] CRAN (R 4.5.2)
##  rmarkdown      2.30    2025-09-28 [1] CRAN (R 4.5.0)
##  rstudioapi     0.18.0  2026-01-16 [1] CRAN (R 4.5.2)
##  S7             0.2.1   2025-11-14 [1] CRAN (R 4.5.2)
##  sass           0.4.10  2025-04-11 [1] CRAN (R 4.5.0)
##  scales         1.4.0   2025-04-24 [1] CRAN (R 4.5.0)
##  sessioninfo    1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
##  stringi        1.8.7   2025-03-27 [1] CRAN (R 4.5.0)
##  stringr      * 1.6.0   2025-11-04 [1] CRAN (R 4.5.0)
##  tibble       * 3.3.1   2026-01-11 [1] CRAN (R 4.5.2)
##  tidyr        * 1.3.2   2025-12-19 [1] CRAN (R 4.5.2)
##  tidyselect     1.2.1   2024-03-11 [1] CRAN (R 4.5.0)
##  tidyverse    * 2.0.0   2023-02-22 [1] CRAN (R 4.5.0)
##  timechange     0.4.0   2026-01-29 [1] CRAN (R 4.5.2)
##  tzdb           0.5.0   2025-03-15 [1] CRAN (R 4.5.0)
##  usethis        3.2.1   2025-09-06 [1] CRAN (R 4.5.0)
##  vctrs          0.7.1   2026-01-23 [1] CRAN (R 4.5.2)
##  viridisLite    0.4.3   2026-02-04 [1] CRAN (R 4.5.2)
##  withr          3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
##  xfun           0.56    2026-01-18 [1] CRAN (R 4.5.2)
##  yaml           2.3.12  2025-12-10 [1] CRAN (R 4.5.2)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
##  * ── Packages attached to the search path.
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /Users/jono/.cache/uv/archive-v0/3n3euDImmjsw3EYTJjfeY/bin/python
##  libpython:      /Users/jono/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/libpython3.12.dylib
##  pythonhome:     /Users/jono/.cache/uv/archive-v0/3n3euDImmjsw3EYTJjfeY:/Users/jono/.cache/uv/archive-v0/3n3euDImmjsw3EYTJjfeY
##  virtualenv:     /Users/jono/.cache/uv/archive-v0/3n3euDImmjsw3EYTJjfeY/bin/activate_this.py
##  version:        3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ]
##  numpy:          /Users/jono/.cache/uv/archive-v0/3n3euDImmjsw3EYTJjfeY/lib/python3.12/site-packages/numpy
##  numpy_version:  2.4.4
##  
##  NOTE: Python version was forced by VIRTUAL_ENV
## 
## ──────────────────────────────────────────────────────────────────────────────</pre>
</details>
<p><br /></p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://jcarroll.com.au/2026/04/17/schotter-plots-in-r/"> rstats on Irregularly Scheduled Programming</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/schotter-plots-in-r/">Schotter Plots in R</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400617</post-id>	</item>
		<item>
		<title>What&#8217;s new in R 4.6.0?</title>
		<link>https://www.r-bloggers.com/2026/04/whats-new-in-r-4-6-0/</link>
		
		<dc:creator><![CDATA[The Jumping Rivers Blog]]></dc:creator>
		<pubDate>Thu, 16 Apr 2026 23:59:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.jumpingrivers.com/blog/whats-new-r46/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>R 4.6.0 (“Because it was There”) is set for release on April 24th 2026. Here we summarise some of<br />
the more interesting changes that have been introduced. In previous blog posts, we have discussed<br />
the new features introduced in<br />
R 4.5.0<br />
and earlier versions (see the links at the end of this post).<br />
...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/whats-new-in-r-4-6-0/">What’s new in R 4.6.0?</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.jumpingrivers.com/blog/whats-new-r46/"> The Jumping Rivers Blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>
<a href = "https://www.jumpingrivers.com/blog/whats-new-r46/">
<img src="https://i0.wp.com/www.jumpingrivers.com/blog/whats-new-r46/featured.jpg?w=400&#038;ssl=1" style="width:400px" class="image-center" style="display: block; margin: auto;" data-recalc-dims="1" />
</a>
</p>
<p>R 4.6.0 (“Because it was There”) is set for release on April 24th 2026. Here we summarise some of
the more interesting changes that have been introduced. In previous blog posts, we have discussed
the new features introduced in
<a href="https://www.jumpingrivers.com/blog/whats-new-r45/" rel="nofollow" target="_blank">R 4.5.0</a>
and earlier versions (see the links at the end of this post).</p>
<p>Once R 4.6.0 is released, the full changelog will be available at the
<a href="https://cran.r-project.org/doc/manuals/r-release/NEWS.html" rel="nofollow" target="_blank">r-release ‘NEWS’ page</a>.
If you want to keep up to date with developments in base R, have a look at the
<a href="https://cran.r-project.org/doc/manuals/r-devel/NEWS.html" rel="nofollow" target="_blank">r-devel ‘NEWS’ page</a>.</p>
<aside class="advert">
<p>
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, <a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&#038;utm_medium=banner&#038;utm_campaign=2026-whats-new-r46" rel="nofollow" target="_blank">Jumping Rivers can help</a>.
</p>
</aside>
<h2 id="-values-in-collection"><code>! (values %in% collection)</code></h2>
<p>Code should be readable, and easily understood.
And although it isn’t a natural language, there’s something <em>off</em> about code that reads like:</p>
<pre>If not a blog post is readable, I close the browser tab.
</pre>
<p>To check if one (or more) value is in some collection, R has the <code>%in%</code> operator:</p>
<pre>&quot;a&quot; %in% letters
[1] TRUE

&quot;a&quot; in LETTERS
[1] FALSE
</pre><p>This is different from the <code>in</code> keyword, which you use when iterating over a collection:</p>
<pre>for (x in letters[1:3]) {
 message(x)
}
# a
# b
# c
</pre><p>Sometimes you want to know whether a value is absent from a collection.
The standard way to do this is to invert results from <code>%in%</code>:</p>
<pre>! &quot;a&quot; %in% LETTERS
[1] TRUE
</pre><p>It’s unambiguous to the R interpreter.
But it can be hard to read and understand &#8211; on scanning that statement, you might forget that <code>!</code>
acts after the <code>%in%</code>.
As such, we often wrap the <code>%in%</code> expression with parentheses to make the code more clear:</p>
<pre>! (&quot;a&quot; %in% LETTERS)
[1] TRUE
</pre><p>For the sake of clarity, many developers have implemented their own absence-checking operator.
Writing a custom operator in R uses similar syntax to that used when writing a function:</p>
<pre>`%NOTIN%` = function(x, y) {
 ! (x %in% y)
}

&quot;a&quot; %NOTIN% LETTERS
[1] TRUE
</pre><p>Were you to write the same code multiple times in the same project, you would write a function.
Similarly, if you (or your team) wrote the same function in multiple files or projects, you might
add it to a package and import it.
So if lots of package developers have implemented the same operator or function, across their CRAN
packages, maybe it should be pushed to a higher plane…</p>
<p>That is what has happened with the introduction of <code>%notin%</code> in R 4.6.0.
An operator that was found across lots of separate packages has been moved up into base R:</p>
<pre>&quot;a&quot; %notin% LETTERS
[1] TRUE

&quot;a&quot; %notin% letters
[1] FALSE
</pre><h2 id="doi-citations">DOI citations</h2>
<p>If you use R in your publications or your projects, you may need to provide a citation for it.
rOpenSci has a
<a href="https://ropensci.org/blog/2021/11/16/how-to-cite-r-and-r-packages/" rel="nofollow" target="_blank">blog post about citing R and R packages</a> &#8211; why, when and how to do it.</p>
<p>For the R project as a whole, there is a simple function <code>citation()</code> that provides the information
you need:</p>
<pre>citation()

To cite R in publications use:

 R Core Team (2026). _R: A Language and Environment for Statistical
 Computing_. R Foundation for Statistical Computing, Vienna, Austria.
 doi:10.32614/R.manuals &lt;https://doi.org/10.32614/R.manuals&gt;.
 &lt;https://www.R-project.org/&gt;.

A BibTeX entry for LaTeX users is

 @Manual{,
 ...
 }
...
</pre><p>In R 4.6.0, a <a href="https://www.doi.org/the-identifier/what-is-a-doi/" rel="nofollow" target="_blank">DOI</a> (Digital Object Identifier)
has been added to the citation entry to make it easier to reference R in your published work.</p>
<h2 id="summarycharacter_vector-charactermethod--factor"><code>summary(character_vector, character.method = &quot;factor&quot;)</code></h2>
<p><code>str()</code> and <code>summary()</code> are two of the first functions I reach for when exploring a dataset.
For a data-frame, these tell me</p>
<ul>
<li>what type of columns are present (<code>str()</code>: <code>chr</code>, <code>num</code>, <code>Date</code>, …); and</li>
<li>what is present in each column (<code>summary()</code>: gives the min, max, mean of each numeric column, for
example).</li>
</ul>
<p><code>summary()</code> works with data-structures other than just data-frames.</p>
<p>For factors, <code>summary()</code> tells you how many observations of the factor levels were observed:</p>
<pre># &#39;species&#39; and &#39;island&#39; are factor columns in `penguins`
summary(penguins[1:3])

 species island bill_len
 Adelie :152 Biscoe :168 Min. :32.10
 Chinstrap: 68 Dream :124 1st Qu.:39.23
 Gentoo :124 Torgersen: 52 Median :44.45
 Mean :43.92
 3rd Qu.:48.50
 Max. :59.60
 NA&#39;s :2
</pre><p>For character columns, the output from <code>summary()</code> has been a little obtuse:</p>
<pre># R 4.5.0

# &#39;studyName&#39; and &#39;Species&#39; are character columns in `penguins_raw`
summary(penguins_raw[1:3])

 studyName Sample Number Species
 Length:344 Min. : 1.00 Length:344
 Class :character 1st Qu.: 29.00 Class :character
 Mode :character Median : 58.00 Mode :character
 Mean : 63.15
 3rd Qu.: 95.25
 Max. :152.00
</pre><p>R 4.6.0 adds a neater way to summarise character vectors/columns to <code>summary()</code>:</p>
<pre># R 4.6.0
summary(penguins_raw[1:3])

 studyName Sample Number Species
 Length :344 Min. : 1.00 Length :344
 N.unique : 3 1st Qu.: 29.00 N.unique : 3
 N.blank : 0 Median : 58.00 N.blank : 0
 Min.nchar: 7 Mean : 63.15 Min.nchar: 33
 Max.nchar: 7 3rd Qu.: 95.25 Max.nchar: 41
 Max. :152.00
</pre><p>We can also summarise character vectors/columns as if they were factors:</p>
<pre># R 4.6.0
summary(penguins_raw[1:3], character.method = &quot;factor&quot;)

 studyName Sample Number Species
 PAL0708:110 Min. : 1.00 Adelie Penguin (Pygoscelis adeliae) :152
 PAL0809:114 1st Qu.: 29.00 Chinstrap penguin (Pygoscelis antarctica): 68
 PAL0910:120 Median : 58.00 Gentoo penguin (Pygoscelis papua) :124
 Mean : 63.15
 3rd Qu.: 95.25
 Max. :152.00
</pre><h2 id="listfiles-fixed--true"><code>list.files(..., fixed = TRUE)</code></h2>
<p>Suppose we have three files in my working directory: <code>abc.Rmd</code>, <code>CONTRIBUTING.md</code> and <code>README.md</code>.
If I want to obtain the filenames for the “.md” files from an R script, I can list those files that
match a pattern:</p>
<pre># R 4.5.0
list.files(pattern = &quot;.md&quot;)
[1] &quot;abc.Rmd&quot; &quot;CONTRIBUTING.md&quot; &quot;README.md&quot;
</pre><p>Hmmm.</p>
<p>In the pattern, <code>.</code> actually matches any character.
In R 4.5.0, if I want to match the <code>.</code> character explicitly, I can escape it in the pattern.
But <a href="https://r4ds.hadley.nz/regexps.html" rel="nofollow" target="_blank">pattern matching</a> can lead to complicated code in R,
because some characters are treated specially by the pattern matcher, and some are treated specially
by R’s string parser.</p>
<p>To tell R to find files with a literal ‘.md’ in the filename, we escape the <code>.</code> character twice:
once for the <code>.</code> (to tell the pattern matcher to match a <code>.</code>, rather than any character),
and once to escape the <code>\</code> (to tell R that the subsequent <code>\</code> is really a backslash)</p>
<pre># R 4.5.0
list.files(pattern = &quot;\\.md&quot;)
[1] &quot;CONTRIBUTING.md&quot; &quot;README.md&quot;
</pre><p>R 4.0.x cleaned that up a bit, we can now use
<a href="https://www.jumpingrivers.com/blog/r-version-4-features/" rel="nofollow" target="_blank">‘raw strings’ in R</a>.
Everything between the parentheses in the next pattern is passed directly to the pattern matcher:</p>
<pre># R 4.5.0
list.files(pattern = r&quot;(\.md)&quot;)
</pre><p>Now, in R 4.6.0, we can indicate to <code>list.files()</code> (and the synonym <code>dir()</code>) that our pattern is a
fixed string (rather than a regular expression). With this, we don’t need to escape the <code>.</code>
character to match the “.md” suffix.</p>
<pre>list.files(pattern = &quot;.md&quot;, fixed = TRUE)
[1] &quot;CONTRIBUTING.md&quot; &quot;README.md&quot;
</pre><h2 id="other-matters">Other matters</h2>
<ul>
<li><code>read.dcf()</code> now allows comment lines, which means you can annotate your config files (e.g., for
{lintr}).</li>
<li><code>df |&gt; plot(col2 ~ col1)</code> can now be used for base plotting; this is a little neater for
exploratory work than <code>df |&gt; plot(col2 ~ col1, data = _)</code></li>
<li>C++20 is now the default C++ standard</li>
</ul>
<h2 id="trying-out-r-460">Trying out R 4.6.0</h2>
<p>To take away the pain of installing the latest development version of R, you can use docker.
To use the <code>devel</code> version of R, you can use the following commands:</p>
<pre>docker pull rstudio/r-base:devel-jammy
docker run --rm -it rstudio/r-base:devel-jammy
</pre><p>Once R 4.6 is the released version of R and the <code>r-docker</code> repository has been updated, you should
use the following command to test out R 4.6.</p>
<pre>docker pull rstudio/r-base:4.6-jammy
docker run --rm -it rstudio/r-base:4.6-jammy
</pre><p>An alternative way to install multiple versions of R on the same machine is using
<a href="https://github.com/r-lib/rig" rel="nofollow" target="_blank"><code>rig</code></a>.</p>
<h2 id="see-also">See also</h2>
<p>The R 4.x versions have introduced a wealth of interesting changes.
These have been summarised in our earlier blog posts:</p>
<ul>
<li><a href="https://www.jumpingrivers.com/blog/r-version-4-features/" rel="nofollow" target="_blank">R 4.0.0</a></li>
<li><a href="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/" rel="nofollow" target="_blank">R 4.1.0</a></li>
<li><a href="https://www.jumpingrivers.com/blog/new-features-r420/" rel="nofollow" target="_blank">R 4.2.0</a></li>
<li><a href="https://www.jumpingrivers.com/blog/whats-new-r43/" rel="nofollow" target="_blank">R 4.3.0</a></li>
<li><a href="https://www.jumpingrivers.com/blog/whats-new-r44/" rel="nofollow" target="_blank">R 4.4.0</a></li>
<li><a href="https://www.jumpingrivers.com/blog/whats-new-r45/" rel="nofollow" target="_blank">R 4.5.0</a></li>
</ul>
<p>
For updates and revisions to this article, see the <a href = "https://www.jumpingrivers.com/blog/whats-new-r46/">original post</a>
</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.jumpingrivers.com/blog/whats-new-r46/"> The Jumping Rivers Blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/whats-new-in-r-4-6-0/">What’s new in R 4.6.0?</a>]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">400593</post-id>	</item>
		<item>
		<title>My Domain: proteome-wide scanning of TMDs</title>
		<link>https://www.r-bloggers.com/2026/04/my-domain-proteome-wide-scanning-of-tmds/</link>
		
		<dc:creator><![CDATA[Stephen Royle]]></dc:creator>
		<pubDate>Thu, 16 Apr 2026 12:04:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://quantixed.org/?p=3753</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> I wanted to know: After a little bit of searching, I couldn’t find any answers. So I decided to use R to retrieve the necessary info from Uniprot and calculate it myself. I thought I’d post it here in case it’s useful for others. Human We’ll ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/my-domain-proteome-wide-scanning-of-tmds/">My Domain: proteome-wide scanning of TMDs</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://quantixed.org/2026/04/16/my-domain-proteome-wide-scanning-of-tmds/"> Rstats – quantixed</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>I wanted to know:</p>



<ol class="wp-block-list">
<li>how many proteins in the human proteome have transmembrane domains?</li>



<li>of those that do, how many have 1 or 2 or n transmembrane domains?</li>
</ol>



<p>After a little bit of searching, I couldn’t find any answers. So I decided to use R to retrieve the necessary info from Uniprot and calculate it myself. I thought I’d post it here in case it’s useful for others.</p>



<h2 class="wp-block-heading">Human</h2>



<figure data-wp-context="{"imageId":"69e0e19542d22"}" data-wp-interactive="core/image" data-wp-key="69e0e19542d22" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" fetchpriority="high" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/human_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1" alt="" class="wp-image-3754" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/human_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/human_uniprot_tm_counts-300x169.png 300w, https://quantixed.org/wp-content/uploads/2026/04/human_uniprot_tm_counts-768x432.png 768w, https://quantixed.org/wp-content/uploads/2026/04/human_uniprot_tm_counts-1536x864.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/human_uniprot_tm_counts.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>We’ll start with the info I wanted. According to Uniprot there are 20,659 proteins in the human proteome. <strong>One quarter of these have one or more TMD</strong>. The majority have one TMD and there are almost 1,000 7TM proteins (all those GPCRs, I guess). There’s 413 4TM and 327 2TM proteins. We can find examples of 1 through 17 TMDs, there’s no proteins with 18, 4 proteins with 19TM, 21 with 24TM and 2 with 38TM.</p>



<p>The analysis is done simply by looking at how many <code>TRANSMEM</code> Uniprot has for each IDs in the reference proteome. I have not distinguished between helical and partial entries, and of course it’s possible that the annotations are not quite correct.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td>TMDs</td><td>Count</td><td>Frequency (proteome as %)</td><td>Frequency (TMDPs as %)</td></tr><tr><td>1</td><td>2402</td><td>11.6</td><td>45.8</td></tr><tr><td>2</td><td>327</td><td>1.6</td><td>6.2</td></tr><tr><td>3</td><td>159</td><td>0.8</td><td>3.0</td></tr><tr><td>4</td><td>413</td><td>2.0</td><td>7.9</td></tr><tr><td>5</td><td>77</td><td>0.4</td><td>1.5</td></tr><tr><td>6</td><td>276</td><td>1.3</td><td>5.3</td></tr><tr><td>7</td><td>947</td><td>4.6</td><td>18.0</td></tr><tr><td>8</td><td>83</td><td>0.4</td><td>1.6</td></tr><tr><td>9</td><td>63</td><td>0.3</td><td>1.2</td></tr><tr><td>10</td><td>123</td><td>0.6</td><td>2.3</td></tr><tr><td>11</td><td>75</td><td>0.4</td><td>1.4</td></tr><tr><td>12</td><td>202</td><td>1.0</td><td>3.8</td></tr><tr><td>13</td><td>24</td><td>0.1</td><td>0.5</td></tr><tr><td>14</td><td>26</td><td>0.1</td><td>0.5</td></tr><tr><td>15</td><td>13</td><td>0.1</td><td>0.2</td></tr><tr><td>16</td><td>1</td><td>0.0</td><td>0.0</td></tr><tr><td>17</td><td>9</td><td>0.0</td><td>0.2</td></tr><tr><td>19</td><td>4</td><td>0.0</td><td>0.1</td></tr><tr><td>24</td><td>21</td><td>0.1</td><td>0.4</td></tr><tr><td>38</td><td>2</td><td>0.0</td><td>0.0</td></tr></tbody></table></figure>



<p>Having written this code, I decided to run some other proteomes to see how they compare.</p>



<h2 class="wp-block-heading">Model organisms</h2>



<figure class="wp-block-gallery has-nested-images columns-2 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex">
<figure data-wp-context="{"imageId":"69e0e19546faa"}" data-wp-interactive="core/image" data-wp-key="69e0e19546faa" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3757" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/drosophila_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1" alt="" class="wp-image-3757" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/drosophila_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/drosophila_uniprot_tm_counts-300x169.png 300w, https://quantixed.org/wp-content/uploads/2026/04/drosophila_uniprot_tm_counts-768x432.png 768w, https://quantixed.org/wp-content/uploads/2026/04/drosophila_uniprot_tm_counts-1536x864.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/drosophila_uniprot_tm_counts.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<figure data-wp-context="{"imageId":"69e0e1954959d"}" data-wp-interactive="core/image" data-wp-key="69e0e1954959d" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3756" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/worm_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1" alt="" class="wp-image-3756" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/worm_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/worm_uniprot_tm_counts-300x169.png 300w, https://quantixed.org/wp-content/uploads/2026/04/worm_uniprot_tm_counts-768x432.png 768w, https://quantixed.org/wp-content/uploads/2026/04/worm_uniprot_tm_counts-1536x864.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/worm_uniprot_tm_counts.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<figure data-wp-context="{"imageId":"69e0e19549b5f"}" data-wp-interactive="core/image" data-wp-key="69e0e19549b5f" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3758" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/yeast_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1" alt="" class="wp-image-3758" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/yeast_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/yeast_uniprot_tm_counts-300x169.png 300w, https://quantixed.org/wp-content/uploads/2026/04/yeast_uniprot_tm_counts-768x432.png 768w, https://quantixed.org/wp-content/uploads/2026/04/yeast_uniprot_tm_counts-1536x864.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/yeast_uniprot_tm_counts.png 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<figure data-wp-context="{"imageId":"69e0e19549fff"}" data-wp-interactive="core/image" data-wp-key="69e0e19549fff" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3755" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/zebrafish_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1" alt="" class="wp-image-3755" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/zebrafish_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/zebrafish_uniprot_tm_counts-300x169.png 300w, https://quantixed.org/wp-content/uploads/2026/04/zebrafish_uniprot_tm_counts-768x432.png 768w, https://quantixed.org/wp-content/uploads/2026/04/zebrafish_uniprot_tm_counts-1536x864.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/zebrafish_uniprot_tm_counts.png 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</figure>



<p>These four organisms have between 18% and 29% of the proteome made of proteins with TMDs. The pattern of numbers of TMDs are kind of similar although there’s no peak in yeast for 7TM and the peaks for 2, 4, 6 or 12 TMs differ from human.</p>



<p>Maybe this information is out there in some database or other. As I said, I couldn’t find something easily. Even if there are more precise ways of determining the TMDs, I think this data is good enough to know roughly what the proportions are.</p>



<h2 class="wp-block-heading">The code</h2>



<p>I manually downloaded the fasta.gz files for the reference proteomes. They are currently linked <a href="https://www.uniprot.org/proteomes?query=proteome_type%3A1" rel="nofollow" target="_blank">here</a>.</p>



<p>To extract all the Uniprot IDs, I used a shell one-liner:</p>


<pre>
awk &#039;/^&gt;sp\|.*\|/{gsub(/^&gt;sp\|/,&quot;&quot;); gsub(/\|.*/,&quot;&quot;); print &quot;&gt;&quot; $0; next} {print}&#039; file.fasta | grep &quot;^&gt;&quot; | sed &#039;s/&gt;//g&#039; &gt; species_uniprot.txt
</pre>


<p>Then I used this R script. The main function can probably be simplified. I had to add several checks to make sure I got all the data back from the API. Before posting, I tried to cut it back to make it easier to read, but Ionly succeeded in breaking the script! This is the working version.</p>


<pre>
# if (!require(&quot;BiocManager&quot;, quietly = TRUE)) {
#   install.packages(&quot;BiocManager&quot;) 
# }
# BiocManager::install(&quot;biomaRt&quot;)
library(httr)
library(stringr)
library(ggplot2)
library(biomaRt)
library(dplyr)
library(tidyr)
library(cowplot)

## FUNCTIONS ----

isJobReady &lt;- function(jobId, pollingInterval = 5, maxWaitSeconds = 3600) {
  if (is.null(jobId) || length(jobId) == 0 || is.na(jobId) || !nzchar(jobId)) {
    return(FALSE)
  }
  nTries &lt;- ceiling(maxWaitSeconds / pollingInterval)
  for (i in 1:nTries) {
    url &lt;- paste(&quot;https://rest.uniprot.org/idmapping/status/&quot;, jobId, sep = &quot;&quot;)
    r &lt;- GET(url = url, accept_json())
    status &lt;- content(r, as = &quot;parsed&quot;)
    if (!is.null(status[[&quot;results&quot;]]) || !is.null(status[[&quot;failedIds&quot;]])) {
      return(TRUE)
    }
    if (!is.null(status[[&quot;messages&quot;]])) {
      print(status[[&quot;messages&quot;]])
      return(FALSE)
    }
    Sys.sleep(pollingInterval)
  }
  return(FALSE)
}

retrieveUniprotInfo &lt;- function(x,
                                chunk_size = 5000,
                                maxWaitSeconds = 3600,
                                taxId = &quot;9606&quot;,
                                progress = TRUE) {
  normalize_uniprot_ids &lt;- function(values) {
    values &lt;- trimws(values)
    # Accept FASTA-style headers like: sp|P12345|... or tr|A0A...|...
    m &lt;- str_match(values, regex(&quot;^&gt;?\\s*(?:sp|tr)\\|([^|]+)\\|&quot;, ignore_case = TRUE))
    values &lt;- ifelse(!is.na(m[, 2]), m[, 2], values)
    values
  }
  
  ids &lt;- unique(normalize_uniprot_ids(x))
  ids &lt;- ids[!is.na(ids) &#038; nzchar(ids)]
  if (length(ids) == 0) {
    stop(&quot;No valid identifiers were provided to retrieveUniprotInfo().&quot;)
  }
  
  fields &lt;- &quot;accession,id,protein_name,gene_names,ft_transmem,length,cc_function,cc_subcellular_location,go_p,go_c&quot;
  acc_pattern &lt;- &quot;^[OPQ][0-9][A-Z0-9]{3}[0-9](-[0-9]+)?$|^[A-NR-Z][0-9](?:[A-Z][A-Z0-9]{2}[0-9]){1,2}(-[0-9]+)?$&quot;
  is_accession &lt;- str_detect(ids, acc_pattern)
  
  split_into_chunks &lt;- function(values, chunk_size = chunk_size) {
    split(values, ceiling(seq_along(values) / chunk_size))
  }
  
  get_next_link &lt;- function(link_header) {
    if (is.null(link_header)) {
      return(NULL)
    }
    links &lt;- unlist(strsplit(link_header, &quot;,\\s*&quot;))
    next_link &lt;- links[str_detect(links, &quot;rel=\\\&quot;next\\\&quot;&quot;)]
    if (length(next_link) == 0) {
      return(NULL)
    }
    next_url &lt;- str_extract(next_link[1], &quot;(?&lt;=&lt;).+?(?=&gt;)&quot;)
    if (is.na(next_url) || !nzchar(next_url)) {
      return(NULL)
    }
    next_url
  }
  
  read_tsv_response &lt;- function(resp) {
    read.table(
      text = content(resp, as = &quot;text&quot;, encoding = &quot;UTF-8&quot;),
      sep = &quot;\t&quot;,
      header = TRUE,
      fill = TRUE,
      quote = &quot;&quot;,
      comment.char = &quot;&quot;,
      check.names = FALSE
    )
  }
  
  fetch_from_redirect &lt;- function(redirect_url) {
    if (is.null(redirect_url) || length(redirect_url) == 0 ||
        is.na(redirect_url) || !nzchar(redirect_url)) {
      return(NULL)
    }
    
    # The paged idmapping results endpoint is capped at size &lt;= 500.
    # Use stream endpoint to retrieve the full chunk in one response.
    stream_url &lt;- gsub(&quot;/results/&quot;, &quot;/results/stream/&quot;, redirect_url)
    sep &lt;- ifelse(str_detect(stream_url, &quot;\\?&quot;), &quot;&#038;&quot;, &quot;?&quot;)
    stream_url &lt;- paste0(
      stream_url,
      sep,
      &quot;fields=&quot;, URLencode(fields, reserved = TRUE),
      &quot;&#038;format=tsv&quot;
    )
    
    r &lt;- GET(url = stream_url)
    if (status_code(r) &lt; 400) {
      return(read_tsv_response(r))
    }
    
    # Fallback to paged endpoint if stream is unavailable.
    sep &lt;- ifelse(str_detect(redirect_url, &quot;\\?&quot;), &quot;&#038;&quot;, &quot;?&quot;)
    url &lt;- paste0(
      redirect_url,
      sep,
      &quot;fields=&quot;, URLencode(fields, reserved = TRUE),
      &quot;&#038;format=tsv&#038;size=500&quot;
    )
    
    r &lt;- GET(url = url)
    stop_for_status(r)
    resultsTable &lt;- read_tsv_response(r)
    
    next_url &lt;- get_next_link(headers(r)[[&quot;link&quot;]])
    while (!is.null(next_url) &#038;&#038; !is.na(next_url) &#038;&#038; nzchar(next_url)) {
      r &lt;- GET(url = next_url)
      stop_for_status(r)
      resultsTable &lt;- rbind(resultsTable, read_tsv_response(r))
      next_url &lt;- get_next_link(headers(r)[[&quot;link&quot;]])
    }
    
    resultsTable
  }
  
  map_ids &lt;- function(values, from_db, to_db, chunk_size, taxId = NULL,
                      label = &quot;ids&quot;) {
    if (length(values) == 0) {
      return(NULL)
    }
    
    results_list &lt;- list()
    chunks &lt;- split_into_chunks(values, chunk_size = chunk_size)
    n_chunks &lt;- length(chunks)
    for (i in seq_along(chunks)) {
      chunk &lt;- chunks[[i]]
      if (isTRUE(progress)) {
        cat(sprintf(&quot;[UniProt] %s chunk %d/%d (%d ids)\n&quot;,
                    label, i, n_chunks, length(chunk)))
      }
      
      files &lt;- list(
        ids = paste0(chunk, collapse = &quot;,&quot;),
        from = from_db,
        to = to_db
      )
      if (!is.null(taxId)) {
        files$taxId &lt;- taxId
      }
      
      r &lt;- POST(url = &quot;https://rest.uniprot.org/idmapping/run&quot;, body = files,
                encode = &quot;multipart&quot;, accept_json())
      stop_for_status(r)
      submission &lt;- content(r, as = &quot;parsed&quot;, encoding = &quot;UTF-8&quot;)
      
      job_id &lt;- submission[[&quot;jobId&quot;]]
      if (is.null(job_id) || length(job_id) == 0 || is.na(job_id) || !nzchar(job_id)) {
        if (isTRUE(progress)) {
          cat(sprintf(&quot;[UniProt] %s chunk %d/%d: no jobId returned\n&quot;,
                      label, i, n_chunks))
        }
        next
      }
      if (!isJobReady(job_id, maxWaitSeconds = maxWaitSeconds)) {
        if (isTRUE(progress)) {
          cat(sprintf(&quot;[UniProt] %s chunk %d/%d: timeout/not ready\n&quot;,
                      label, i, n_chunks))
        }
        next
      }
      
      details_url &lt;- paste(&quot;https://rest.uniprot.org/idmapping/details/&quot;,
                           job_id, sep = &quot;&quot;)
      r &lt;- GET(url = details_url, accept_json())
      stop_for_status(r)
      details &lt;- content(r, as = &quot;parsed&quot;, encoding = &quot;UTF-8&quot;)
      
      redirect_url &lt;- details[[&quot;redirectURL&quot;]]
      if (is.null(redirect_url) || length(redirect_url) == 0 ||
          is.na(redirect_url) || !nzchar(redirect_url)) {
        if (isTRUE(progress)) {
          cat(sprintf(&quot;[UniProt] %s chunk %d/%d: missing redirectURL\n&quot;,
                      label, i, n_chunks))
        }
        next
      }
      chunk_result &lt;- fetch_from_redirect(redirect_url)
      if (is.null(chunk_result)) {
        if (isTRUE(progress)) {
          cat(sprintf(&quot;[UniProt] %s chunk %d/%d: invalid redirectURL\n&quot;,
                      label, i, n_chunks))
        }
        next
      }
      
      results_list[[length(results_list) + 1]] &lt;- chunk_result
      if (isTRUE(progress)) {
        cat(sprintf(&quot;[UniProt] %s chunk %d/%d: completed\n&quot;,
                    label, i, n_chunks))
      }
    }
    
    if (length(results_list) == 0) {
      return(NULL)
    }
    do.call(rbind, results_list)
  }
  
  accession_ids &lt;- ids[is_accession]
  gene_like_ids &lt;- ids[!is_accession]
  
  acc_results &lt;- map_ids(
    values = accession_ids,
    from_db = &quot;UniProtKB_AC-ID&quot;,
    to_db = &quot;UniProtKB&quot;,
    chunk_size = chunk_size,
    label = &quot;accessions&quot;
  )
  
  gene_results &lt;- map_ids(
    values = gene_like_ids,
    from_db = &quot;Gene_Name&quot;,
    to_db = &quot;UniProtKB-Swiss-Prot&quot;,
    chunk_size = chunk_size,
    taxId = taxId,
    label = &quot;gene_names&quot;
  )
  
  results_list &lt;- list()
  if (!is.null(acc_results)) {
    results_list[[length(results_list) + 1]] &lt;- acc_results
  }
  if (!is.null(gene_results)) {
    results_list[[length(results_list) + 1]] &lt;- gene_results
  }
  
  if (length(results_list) == 0) {
    stop(&quot;No UniProt results were returned. Check identifiers and taxId.&quot;)
  }
  
  resultsTable &lt;- do.call(rbind, results_list)
  if (&quot;Entry&quot; %in% colnames(resultsTable)) {
    resultsTable &lt;- resultsTable[!duplicated(resultsTable$Entry), ]
  }
  return(resultsTable)
}



## SCRIPT ----

species &lt;- c(&quot;human&quot;, &quot;zebrafish&quot;, &quot;drosophila&quot;, &quot;worm&quot;, &quot;yeast&quot;)
sci_names &lt;- c(&quot;human&quot; = &quot;Homo sapiens&quot;, &quot;zebrafish&quot; = &quot;Danio rerio&quot;, &quot;drosophila&quot; = &quot;Drosophila melanogaster&quot;,
               &quot;worm&quot; = &quot;Caenorhabditis elegans&quot;, &quot;yeast&quot; = &quot;Saccharomyces cerevisiae&quot;)

for (org in species) {
  # look up scientific name of org
  sci_name &lt;- sci_names[org]
  output_path &lt;- paste0(&quot;Output/Data/&quot;, org, &quot;_uniprot.csv&quot;)
  if(file.exists(output_path)) {
    message(paste(&quot;File&quot;, output_path, &quot;already exists. Loading&quot;, org))
    df &lt;- read.csv(output_path)
  } else {
    message(paste(&quot;Retrieving UniProt info for&quot;, org))
    species_ids &lt;- read.delim(paste0(&quot;Data/&quot;, org, &quot;_uniprot.txt&quot;), header = FALSE)
    names(species_ids) &lt;- c(&quot;uniprot_id&quot;)
    df &lt;- retrieveUniprotInfo(species_ids$uniprot_id)
    # save this result
    write.csv(df, output_path, row.names = FALSE)
  }
  
  df$tms &lt;- str_count(df$Transmembrane, &quot;TRANSMEM&quot;)
  tm_counts &lt;- df %&gt;%
    group_by(tms) %&gt;%
    summarise(count = n()) %&gt;% 
    filter(tms &gt; 0)
  
  p1 &lt;- ggplot(tm_counts, aes(x = tms, y = count)) +
    geom_col(fill = &quot;#009988&quot;) +
    labs(x = &quot;Transmembrane domains&quot;, y = &quot;Count&quot;,
         title = sci_name) +
    lims(x = c(0.5,NA), y = c(0,NA)) +
    theme_bw(10)
  
  p2 &lt;- SuperPlotR::pieplot(x1 = c(sum(df$tms == 0), sum(df$tms &gt; 0)),
                            cols = c(&quot;#bbbbbb&quot;, &quot;#009988&quot;)) +
    # blank background and no legend
    theme_void() +
    theme(legend.position = &quot;none&quot;)
  
  # inset p2 in p1 and add information about the percentages
  p &lt;- ggdraw() +
    draw_plot(p1) +
    # top right
    draw_plot(p2, x = 0.9, y = 0.9, hjust = 1, vjust = 1, width = 0.4, height = 0.4) +
    draw_label(paste0(&quot;Total Proteins: &quot;, nrow(df),
                      &quot;\nNo TM: &quot;, round(sum(df$tms == 0) / nrow(df) * 100, 1),
                      &quot;%\nWith TM(s): &quot;,
                      round(sum(df$tms &gt; 0) / nrow(df) * 100, 1), &quot;%&quot;),
               x = 0.97, y = 0.85, hjust = 1, vjust = 1, size = 8)
  plot_path &lt;- paste0(&quot;Output/Plots/&quot;, org, &quot;_uniprot_tm_counts.png&quot;)
  ggsave(plot_path, p, width = 1600, height = 900, units = &quot;px&quot;, dpi = 300)
}
</pre>


<p>Note that I am using <code>{cowplot}</code> at the end to inset the pie chart and to add the text onto the main plot.</p>



<p>—</p>



<p>The post title comes from “My Domain” by Bernard Butler from his People Move On album.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://quantixed.org/2026/04/16/my-domain-proteome-wide-scanning-of-tmds/"> Rstats – quantixed</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/my-domain-proteome-wide-scanning-of-tmds/">My Domain: proteome-wide scanning of TMDs</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400597</post-id>	</item>
		<item>
		<title>Expanding the Editorial Team: Alec Robitaille and Lucy D&#8217;Agostino McGowan Join as Editors</title>
		<link>https://www.r-bloggers.com/2026/04/expanding-the-editorial-team-alec-robitaille-and-lucy-dagostino-mcgowan-join-as-editors/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Thu, 16 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/04/16/editors2026/</guid>

					<description><![CDATA[<p>At rOpenSci, we’re continually grateful for the support and engagement of our community, who help make research open-source stronger, more inclusive, and more collaborative. The software peer review program continues to grow, and today we announ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/expanding-the-editorial-team-alec-robitaille-and-lucy-dagostino-mcgowan-join-as-editors/">Expanding the Editorial Team: Alec Robitaille and Lucy D’Agostino McGowan Join as Editors</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/04/16/editors2026/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>At rOpenSci, we’re continually grateful for the support and engagement of our community, who help make research open-source stronger, more inclusive, and more collaborative. The <a href="https://ropensci.org/software-review/" rel="nofollow" target="_blank">software peer review program</a> continues to grow, and today we announce that our editorial team keeps expanding:</p>
<p>We’re excited to welcome <em>Alec Robitaille</em> and <em>Lucy D’Agostino McGowan</em> as new editors. Alec joins our general review team, and Lucy our statistical software review team. Their expertise and dedication will help sustain and strengthen software peer review, ensuring that software reviews continue to meet the highest standards of quality, transparency, and impact.</p>
<p>Meet our new editors!</p>
<h2>
Alec Robitaille
</h2><div class="float-left"><figure class="m-0"><img src="https://i0.wp.com/ropensci.org/img/team/alec-robitaille.png?w=578&#038;ssl=1"
alt="headshot of Alec Robitaille"ZgotmplZ
ZgotmplZ
ZgotmplZ
style=" object-fit: cover; object-position: center; height: 250px; width: 200; margin-right: 15px" data-recalc-dims="1"
/>
<p>
Alec is a graduate student at Memorial University of Newfoundland and Labrador (Canada) studying foraging ecology, habitat selection and social networks in caribou and other ungulates. He has been involved in many projects along the way, from measuring muskrat habitat and lake ice dynamics at McGill University to estimating drought sensitivity in Canada’s forests with the Canadian Forestry Service. Passionate about teaching open science and programming, he regularly mentors peers, runs workshops, develops example repositories, and organizes a Bayesian stats colearning group. He maintains the package <a href="https://github.com/ropensci/spatsoc" rel="nofollow" target="_blank">spatsoc</a> (<a href="https://github.com/ropensci/software-review/issues/237" rel="nofollow" target="_blank">reviewed</a> by rOpenSci in 2018) and has developed smaller packages with diverse applications including remote sensing (<a href="https://cran.r-project.org/web/packages/irg/" rel="nofollow" target="_blank">irg</a>), social networks (<a href="https://cran.r-project.org/web/packages/hwig/" rel="nofollow" target="_blank">hwig</a>), camera trap monitoring (<a href="https://github.com/robitalec/camtrapmonitoring" rel="nofollow" target="_blank">camtrapmonitoring</a>), and animal movement (<a href="https://cran.r-project.org/web/packages/distanceto/" rel="nofollow" target="_blank">distanceto</a>). He reviewed the <a href="https://github.com/ropensci/software-review/issues/568" rel="nofollow" target="_blank">ohun</a>, <a href="https://github.com/ropensci/software-review/issues/638" rel="nofollow" target="_blank">chopin</a>, and <a href="https://github.com/ropensci/software-review/issues/653" rel="nofollow" target="_blank">emodnet.wfs</a> packages for rOpenSci, guest edited for the <a href="https://github.com/ropensci/software-review/issues/663" rel="nofollow" target="_blank">rredlist</a> package and is currently handling the reviews for the <a href="https://github.com/ropensci/software-review/issues/732" rel="nofollow" target="_blank">ActiGlobe</a>, and <a href="https://github.com/ropensci/software-review/issues/754" rel="nofollow" target="_blank">saperlipopette</a> packages.
</p>
</figure>
</div>
<div style="clear: both;"></div>
<p>Alec on <a href="https://github.com/robitalec" rel="nofollow" target="_blank">GitHub</a>, <a href="http://robitalec.ca/" rel="nofollow" target="_blank">Website</a>.</p>
<blockquote class='blockquote text-left'>
<p class="mb-0">I first connected with the rOpenSci community through the review process for the package spatsoc in 2018 as part of our manuscript submission at Methods in Ecology and Evolution. During the review, I was thoroughly impressed by how welcoming the community was, and how effective the process was in helping me learn how to improve the package. rOpenSci is a landmark in the R and open science ecosystems with an ever evolving community to learn from and to be a part of. I am very grateful to be given the opportunity to continue contributing to rOpenSci in this new role as editor.</p>
<footer class="blockquote-footer">Alec L. Robitaille </footer>
</blockquote>
<h2>
Lucy D’Agostino McGowan
</h2><div class="float-left"><figure class="m-0"><img src="https://i0.wp.com/ropensci.org/img/team/lucy-dagonstino-mcgowan.jpg?w=578&#038;ssl=1"
alt="headshot of Lucy D&#39;Agostino McGowan"ZgotmplZ
ZgotmplZ
ZgotmplZ
style=" object-fit: cover; object-position: center; height: 250px; width: 200; margin-right: 15px" data-recalc-dims="1"
/>
<p>
Lucy D’Agostino McGowan is an associate professor in the Department of Statistical Sciences at Wake Forest University. She received her PhD in Biostatistics from Vanderbilt University and completed her postdoctoral training at Johns Hopkins University Bloomberg School of Public Health. Her research focuses on causal inference, statistical communication, analytic design theory, and data science pedagogy. Lucy can be found blogging at <a href="https://livefreeordichotomize.com/" rel="nofollow" target="_blank">livefreeordichotomize.com</a>, on Blue Sky <a href="https://bsky.app/profile/lucystats.bsky.social" rel="nofollow" target="_blank">@LucyStats.bsky.social</a>, and podcasting on <a href="https://open.spotify.com/show/1L8TqB17Peo7jNgXuPObwi" rel="nofollow" target="_blank">Casual Inference</a>.
</p>
</figure>
</div>
<div style="clear: both;"></div>
<p>Lucy on <a href="https://github.com/LucyMcGowan" rel="nofollow" target="_blank">GitHub</a>, <a href="https://www.lucymcgowan.com/" rel="nofollow" target="_blank">Website</a>.</p>
<blockquote class='blockquote text-left'>
<p class="mb-0">I am so thrilled to join the rOpenSci editorial team! I love the rOpenSci community and mission and am grateful for the opportunity to contribute.</p>
<footer class="blockquote-footer">Lucy D’Agostino McGowan </footer>
</blockquote>
<h2>
About the Software Peer Review Program
</h2><p>rOpenSci’s software peer review program brings together volunteers to collaboratively review scientific and statistical software according to transparent, constructive, and open standards. Editors manage submissions, coordinate reviewers, and help guide packages through review to improve code quality, documentation, and usability.</p>
<p>This program is possible thanks to the many community members: authors submitting their packages, reviewers volunteering their time and expertise, and editors like Alec and Lucy who help managing reviews and maintaining a supportive process.</p>
<h2>
Get Involved
</h2><p>Are you considering submitting your package for review? These resources will help:</p>
<ul>
<li>About <a href="https://ropensci.org/software-review/" rel="nofollow" target="_blank">rOpenSci Software Peer Review</a>;</li>
<li>Browse the online book <a href="https://devguide.ropensci.org/" rel="nofollow" target="_blank">rOpenSci Packages: Development, Maintenance, and Peer Review</a> and <a href="https://stats-devguide.ropensci.org/" rel="nofollow" target="_blank">rOpenSci Statistical Software Peer Review</a>;</li>
<li>Read public <a href="https://github.com/ropensci/software-review/issues" rel="nofollow" target="_blank">software review threads on GitHub</a></li>
</ul>
<p>Would you like to review packages? Fill out the <a href="https://airtable.com/app8dssb6a7PG6Vwj/shrnfDI2S9uuyxtDw" rel="nofollow" target="_blank">rOpenSci Reviewer Sign-Up Form</a> to volunteer to review.</p>
<p>Welcome again <strong>Alec and Lucy</strong>! We’re thrilled to have you join the editorial team.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/04/16/editors2026/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/expanding-the-editorial-team-alec-robitaille-and-lucy-dagostino-mcgowan-join-as-editors/">Expanding the Editorial Team: Alec Robitaille and Lucy D’Agostino McGowan Join as Editors</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400595</post-id>	</item>
		<item>
		<title>Stage II OSCC — Health Economics Model</title>
		<link>https://www.r-bloggers.com/2026/04/stage-ii-oscc-health-economics-model/</link>
		
		<dc:creator><![CDATA[Joseph Rickert]]></dc:creator>
		<pubDate>Thu, 16 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://rworks.dev/posts/oscc-patient-model/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Most health care economics models are constructed from the perspective of a managed health care system such as those offered in Canada and several European countries, or from the perspective of some other third party such as an insurance company...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/stage-ii-oscc-health-economics-model/">Stage II OSCC — Health Economics Model</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://rworks.dev/posts/oscc-patient-model/"> R Works</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p>Most health care economics models are constructed from the perspective of a managed health care system such as those offered in Canada and several European countries, or from the perspective of some other third party such as an insurance company. Although the benefits of constructing models from the patient’s perspective have been discussed in the literature, see for example <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3134464/" rel="nofollow" target="_blank">Ioannidis &#038; Garber (2011)</a> and <a href="https://pubmed.ncbi.nlm.nih.gov/27712720/" rel="nofollow" target="_blank">Tai et al. (2016)</a>, few models have been published. Obvious reasons for this situation would include the privacy issues associated with obtaining patient specific data and the apparent lack of economic incentives to commit resources required to abstract individual patient trajectories from patient level data.</p>
<p>This post explores the hypothesis that detailed, data-driven, patient-specific models may not be necessary in order to achieve most of the benefits of a health economics model from the patients perspective. It present a lightweight, cohort model that captures the majority of the benefits to both physicians and patients that might result from a patient focused model.</p>
<p>The model is intended to be a proof of concept, a minimal viable model that may be useful to physicians both in deciding on treatment options and in informing patients about the potential outcomes and how they may experience these outcomes. To see how the medical literature may be helpful for this kind of modeling I explore the specific case of choosing between surgery and definitive radiation therapy for patients with stage II oral squamous cell carcinoma (OSCC).</p>
<p><strong>Note that the model developed here is not being presented as a medical analysis. No medical experts have been consulted in its construction. It is merely being offered as an example of what could be done to build a health care economics model from a patient’s perspective with today’s modeling tools. Also note that I made significant use of <a href="https://posit.co/blog/introducing-ai-in-rstudio/" rel="nofollow" target="_blank">Posit Assistant</a> for the RStudio IDE configured with Claude Sonnet 4.6 which I found to be immensely helpful both for code construction and literature searches even though utility of the latter use was limited by the vexing proclivity of Claude to hallucinate. </strong></p>
<section id="overview-of-the-model" class="level2">
<h2 class="anchored" data-anchor-id="overview-of-the-model">Overview of the Model</h2>
<p>The recovery of patients undergoing treatment OSCC, either surgery or radiation treatment, is conceived as a stochastic journey through various health states. The states visited, the sequence in which they are visited and the length of stay in each health state are modeled as random variables developing in in the framework of an eight=state, continuous-time Markov chain. Estimates of the transition probabilities among states and the mean time patients would remain in a state drive Markov chain. In the model below, these estimates are derived from the medical literature, however, it is conceivable that clinicians may feel comfortable in making their own estimates, or modifying the literature derived estimates according to their experience and their evaluations of their patients.</p>
<p>The next step uses the theory of continuous time Markov chains (CTMC) to construct a synthetic data set for a cohort of patients who vary in age and tumor size. The key insight here is that the synthetic data set is the underlying statistical model. Empirical survival curves and the time spent in each state and other useful quantities are then calculated from the synthetic data. The patient specific healthcare model is then constructed by estimating the utility of each health state to the patient. This is accomplished by constructing quality adjust life years (QALYs) based on <a href="https://www.ncbi.nlm.nih.gov/books/NBK565680/" rel="nofollow" target="_blank">EQ-5D</a> values reported in the literature.</p>
</section>
<section id="state-diagram-for-a-continuous-time-markov-chain" class="level2">
<h2 class="anchored" data-anchor-id="state-diagram-for-a-continuous-time-markov-chain">State Diagram for a Continuous Time Markov Chain</h2>
<div class="cell">
<details class="code-fold">
<summary>Packages used throughout the post</summary>
<pre>library(ggplot2)
library(grid)
library(dplyr)
library(msm)
library(gt)</pre>
</details>
</div>
<p>It is my understanding that although most patients wit stage II OSCC are treated with surgery there are times when surgery is either not possible or not preferable and the in these exceptional cases, radiation treatment is the primary alternative. The following state diagram is an attempt to abstract both the health states a patient will experience following treatment and the probable paths that will be taken among them. Eight states seemed to me to be the minimum number of states required to represent the complexities of treatment.</p>
<div class="cell">
<details class="code-fold">
<summary>Show code to build state diagram</summary>
<pre># library(ggplot2)
# library(grid)

BG &lt;- &quot;#F0F3F7&quot;

# ── 1. Nodes ─────────────────────────────────────────────────────────────────
# Layout: left=initial treatment, center=adjuvant/NED, right=outcomes/death
nodes &lt;- data.frame(
  id    = 1:8,
  label = c(
    &quot;S1\nSurgery\n± Neck Dissection&quot;,
    &quot;S2\nDefinitive\nRadiation&quot;,
    &quot;S3\nPost-op\nSurveillance&quot;,
    &quot;S4\nAdjuvant RT\n(PORT)&quot;,
    &quot;S5\nAdjuvant\nChemoRT (POCRT)&quot;,
    &quot;S6\nNED\n(Health)&quot;,
    &quot;S7\nLocoregional\nRecurrence&quot;,
    &quot;S8\nDeath&quot;
  ),
  x    = c(1.5,  1.5,  4.5,  4.5,  4.5,  7.0,  9.5,  11.5),
  y    = c(8.5,  3.0,  9.5,  6.5,  3.5,  6.5,  7.5,   5.5),
  fill = c(&quot;#E67E22&quot;,&quot;#8E44AD&quot;,&quot;#F39C12&quot;,&quot;#2980B9&quot;,&quot;#16A085&quot;,
           &quot;#27AE60&quot;,&quot;#C0392B&quot;,&quot;#2C3E50&quot;),
  stringsAsFactors = FALSE
)

# ── 2. Edge definitions: from, to, label, curvature, nudge_x, nudge_y ────────
edge_defs &lt;- list(
  # Surgery → downstream
  c(1, 3,  &quot;No high-risk\nfeatures&quot;,    0.00,  0.10,  0.30),
  c(1, 4,  &quot;High-risk:\nmargins/PNI/\nLVI/nodes&quot;,  0.10, -0.20,  0.20),
  c(1, 5,  &quot;ECE/pos\nmargins&quot;,          0.18,  0.00, -0.30),
  c(1, 8,  &quot;Peri-op\ndeath&quot;,           -0.22,  0.10, -0.20),
  # Definitive RT → downstream
  c(2, 6,  &quot;NED&quot;,                       0.18,  0.00,  0.25),
  c(2, 7,  &quot;Treatment\nfailure&quot;,        0.00,  0.00,  0.25),
  c(2, 8,  &quot;Death&quot;,                     0.14, -0.10, -0.20),
  # Post-op Surveillance → downstream
  c(3, 6,  &quot;&quot;,                          0.00,  0.10,  0.25),
  c(3, 7,  &quot;&quot;,                          0.22,  0.20, -0.10),
  c(3, 8,  &quot;&quot;,                         -0.18,  0.10, -0.18),
  # PORT → downstream
  c(4, 6,  &quot;&quot;,                          0.00,  0.10,  0.25),
  c(4, 7,  &quot;&quot;,                          0.12,  0.10,  0.18),
  c(4, 8,  &quot;&quot;,                          0.14,  0.10, -0.18),
  # POCRT → downstream
  c(5, 6,  &quot;&quot;,                          0.00,  0.10,  0.25),
  c(5, 7,  &quot;&quot;,                          0.00,  0.10,  0.22),
  c(5, 8,  &quot;&quot;,                          0.12,  0.10, -0.18),
  # NED → outcomes
  c(6, 7,  &quot;LR recurrence&quot;,             0.00, -0.10,  0.25),
  c(6, 8,  &quot;Death&quot;,                     0.18,  0.10, -0.14),
  # Locoregional Recurrence → outcomes
  c(7, 8,  &quot;Death&quot;,                     0.00,  0.00,  0.25),
  c(7, 6,  &quot;Salvage\nsuccess&quot;,         -0.28, -0.28,  0.00)
)

edges &lt;- do.call(rbind, lapply(edge_defs, function(r) {
  data.frame(from=as.integer(r[[1]]), to=as.integer(r[[2]]),
             prob=r[[3]], curvature=as.numeric(r[[4]]),
             nudge_x=as.numeric(r[[5]]), nudge_y=as.numeric(r[[6]]),
             stringsAsFactors=FALSE)
}))

# Attach node coordinates
edges &lt;- merge(edges, nodes[, c(&quot;id&quot;,&quot;x&quot;,&quot;y&quot;)], by.x=&quot;from&quot;, by.y=&quot;id&quot;, sort=FALSE)
names(edges)[names(edges)==&quot;x&quot;] &lt;- &quot;x0&quot;; names(edges)[names(edges)==&quot;y&quot;] &lt;- &quot;y0&quot;
edges &lt;- merge(edges, nodes[, c(&quot;id&quot;,&quot;x&quot;,&quot;y&quot;,&quot;fill&quot;)], by.x=&quot;to&quot;, by.y=&quot;id&quot;, sort=FALSE)
names(edges)[names(edges)==&quot;x&quot;] &lt;- &quot;x1&quot;; names(edges)[names(edges)==&quot;y&quot;] &lt;- &quot;y1&quot;
names(edges)[names(edges)==&quot;fill&quot;] &lt;- &quot;dest_col&quot;

# Shorten endpoints to node perimeter
NODE_R &lt;- 0.58
shorten &lt;- function(x0,y0,x1,y1,r) {
  dx&lt;-x1-x0; dy&lt;-y1-y0; d&lt;-sqrt(dx^2+dy^2)
  list(xs=x0+dx*r/d, ys=y0+dy*r/d, xe=x1-dx*r/d, ye=y1-dy*r/d)
}
segs     &lt;- Map(shorten, edges$x0, edges$y0, edges$x1, edges$y1, MoreArgs=list(r=NODE_R))
edges$xs &lt;- sapply(segs,`[[`,&quot;xs&quot;); edges$ys &lt;- sapply(segs,`[[`,&quot;ys&quot;)
edges$xe &lt;- sapply(segs,`[[`,&quot;xe&quot;); edges$ye &lt;- sapply(segs,`[[`,&quot;ye&quot;)
edges$lx &lt;- (edges$xs+edges$xe)/2 + edges$nudge_x
edges$ly &lt;- (edges$ys+edges$ye)/2 + edges$nudge_y

# ── 3. Build layers ───────────────────────────────────────────────────────────
arrow_layers &lt;- lapply(seq_len(nrow(edges)), function(i) {
  e &lt;- edges[i,,drop=FALSE]
  geom_curve(data=e, mapping=aes(x=xs,y=ys,xend=xe,yend=ye),
             curvature=e$curvature[[1]], colour=e$dest_col[[1]],
             linewidth=1.0, alpha=0.80,
             arrow=arrow(length=unit(9,&quot;pt&quot;), type=&quot;closed&quot;, ends=&quot;last&quot;),
             show.legend=FALSE)
})

label_layers &lt;- lapply(seq_len(nrow(edges)), function(i) {
  e &lt;- edges[i,,drop=FALSE]
  if (nchar(trimws(e$prob[[1]])) == 0) return(NULL)
  geom_label(data=e, mapping=aes(x=lx,y=ly,label=prob),
             colour=e$dest_col[[1]], fill=BG, size=2.6, fontface=&quot;bold&quot;,
             label.size=0.22, label.r=unit(0.10,&quot;lines&quot;),
             label.padding=unit(0.15,&quot;lines&quot;), show.legend=FALSE)
})
label_layers &lt;- Filter(Negate(is.null), label_layers)

# ── 4. Base plot ──────────────────────────────────────────────────────────────
base_plot &lt;- ggplot() +
  theme_void(base_size=10) +
  theme(
    plot.background  = element_rect(fill=BG, colour=NA),
    panel.background = element_rect(fill=BG, colour=NA),
    plot.title  = element_text(family=&quot;serif&quot;, face=&quot;bold&quot;, size=16,
                               colour=&quot;#1A2535&quot;, hjust=0.5, margin=margin(b=4)),
    plot.subtitle = element_text(size=9, colour=&quot;#555555&quot;, hjust=0.5,
                                 margin=margin(b=10))
  ) +
  coord_fixed(xlim=c(0, 13), ylim=c(1.5, 11), clip=&quot;off&quot;) +
  labs(title=&quot;Stage II Oral Squamous Cell Carcinoma&quot;,
       subtitle=&quot;CTMC State Transition Diagram&quot;)

# ── 5. Assemble ───────────────────────────────────────────────────────────────
all_layers &lt;- c(arrow_layers, label_layers,
  list(
    # Column labels
    annotate(&quot;text&quot;, x=1.5,  y=10.8, label=&quot;Initial\nTreatment&quot;,
             size=3.2, colour=&quot;#555&quot;, fontface=&quot;italic&quot;, hjust=0.5),
    annotate(&quot;text&quot;, x=4.5,  y=10.8, label=&quot;Post-surgical\nPathway&quot;,
             size=3.2, colour=&quot;#555&quot;, fontface=&quot;italic&quot;, hjust=0.5),
    annotate(&quot;text&quot;, x=7.0,  y=10.8, label=&quot;No Evidence\nof Disease&quot;,
             size=3.2, colour=&quot;#555&quot;, fontface=&quot;italic&quot;, hjust=0.5),
    annotate(&quot;text&quot;, x=9.5,  y=10.8, label=&quot;Disease\nProgression&quot;,
             size=3.2, colour=&quot;#555&quot;, fontface=&quot;italic&quot;, hjust=0.5),
    annotate(&quot;text&quot;, x=11.5, y=10.8, label=&quot;Absorbing\nState&quot;,
             size=3.2, colour=&quot;#555&quot;, fontface=&quot;italic&quot;, hjust=0.5),
    # Nodes
    geom_point(data=nodes, aes(x=x, y=y, colour=fill),
               size=28, show.legend=FALSE),
    scale_colour_identity(),
    geom_text(data=nodes, aes(x=x, y=y, label=label),
              colour=&quot;white&quot;, size=2.8, fontface=&quot;bold&quot;, lineheight=0.9),
    # Surgical candidate bracket
    annotate(&quot;segment&quot;, x=0.25, xend=0.25, y=2.2, yend=9.3,
             colour=&quot;#888&quot;, linewidth=0.5),
    annotate(&quot;segment&quot;, x=0.25, xend=0.45, y=9.3, yend=9.3,
             colour=&quot;#888&quot;, linewidth=0.5),
    annotate(&quot;segment&quot;, x=0.25, xend=0.45, y=2.2, yend=2.2,
             colour=&quot;#888&quot;, linewidth=0.5),
    annotate(&quot;text&quot;, x=0.0, y=5.75, label=&quot;Surgical\ncandidate?\nYes ↑\nNo ↓&quot;,
             size=2.8, colour=&quot;#666&quot;, hjust=0.5, lineheight=0.9)
  )
)

Reduce(&quot;+&quot;, all_layers, init=base_plot)</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i0.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-2-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-1" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-2-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
<table class="caption-top table">
<colgroup>
<col style="width: 8%">
<col style="width: 30%">
<col style="width: 61%">
</colgroup>
<thead>
<tr class="header">
<th>State</th>
<th>Label</th>
<th>Role</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>S1</td>
<td>Surgery</td>
<td>Initial surgical treatment (days–weeks)</td>
</tr>
<tr class="even">
<td>S2</td>
<td>DefinitiveRT</td>
<td>Non-surgical candidates; 6–7 week RT course</td>
</tr>
<tr class="odd">
<td>S3</td>
<td>PostOpSurveillance</td>
<td>Surgery, no high-risk pathological features</td>
</tr>
<tr class="even">
<td>S4</td>
<td>AdjuvantRT (PORT)</td>
<td>High-risk: margins, PNI, LVI, multiple nodes</td>
</tr>
<tr class="odd">
<td>S5</td>
<td>AdjuvantChemoRT (POCRT)</td>
<td>Highest-risk: ECE or positive margins</td>
</tr>
<tr class="even">
<td>S6</td>
<td>NED</td>
<td>No evidence of disease; long-term surveillance</td>
</tr>
<tr class="odd">
<td>S7</td>
<td>LR Recurrence</td>
<td>Locoregional recurrence at primary or nodes</td>
</tr>
<tr class="even">
<td>S8</td>
<td>Death</td>
<td>Absorbing state</td>
</tr>
</tbody>
</table>
</section>
<section id="synthetic-data-simulation" class="level2">
<h2 class="anchored" data-anchor-id="synthetic-data-simulation">Synthetic Data Simulation</h2>
<p>This section builds a synthetic patient cohort for Stage II (T2 N0 M0) oral squamous cell carcinoma using a continuous-time Markov chain (CTMC). The first code block sets up of the simulation by defining the covariates: patient age and tumor size. Mean patient age is set at 62 years old. Tumor size ranges from tumor 2-4 cm, DOI 5-10 mm</p>
<div class="cell">
<details class="code-fold">
<summary>Simulation setup</summary>
<pre>set.seed(6413)

N_PATIENTS &lt;- 1000
MAX_TIME   &lt;- 60      # months (5-year follow-up)
obs_base   &lt;- c(0, 1, 2, 3, 6, 9, 12, 18, 24, 36, 48, 60)

state_labels_ii &lt;- c(
  &quot;Surgery&quot;, &quot;DefinitiveRT&quot;, &quot;PostOpSurveillance&quot;,
  &quot;AdjuvantRT_PORT&quot;, &quot;AdjuvantChemoRT_POCRT&quot;,
  &quot;NED&quot;, &quot;LR_Recurrence&quot;, &quot;Death&quot;
)

# ── Patient covariates ────────────────────────────────────────────────────────
# Stage II OSCC (T2 N0 M0): tumor 2-4 cm, DOI 5-10 mm
age           &lt;- as.integer(round(rnorm(N_PATIENTS, mean = 62, sd = 11)))
age           &lt;- pmax(32L, pmin(85L, age))
tumor_size_cm &lt;- round(runif(N_PATIENTS, 2.0, 4.0), 1)
DOI_mm        &lt;- round(runif(N_PATIENTS, 5.0, 10.0), 1)

# Surgical candidacy: logistic model, ~78% surgery at age 60
surgery_prob &lt;- plogis(1.5 - 0.04 * (age - 60))
treatment    &lt;- ifelse(runif(N_PATIENTS) &lt; surgery_prob, &quot;Surgery&quot;, &quot;DefinitiveRT&quot;)</pre>
</details>
</div>
<section id="clinician-inputs" class="level3">
<h3 class="anchored" data-anchor-id="clinician-inputs">Clinician Inputs</h3>
<p>The simulation is driven by two sets of clinician-interpret able inputs that are entered at the beginning of this section of code:</p>
<ul>
<li><strong>Jump chain probabilities</strong> <code>jump_P[i,j]</code>: given that a patient <em>leaves</em> state <em>i</em>, the probability they transition directly to state <em>j</em>.</li>
<li><strong>Mean sojourn times</strong> <code>mean_sojourn[i]</code>: the average time (months) a patient spends in state <em>i</em> before transitioning.</li>
</ul>
<p>The jump chain may be thought of as an embedded discrete time Markov chain that gives the probabilities of which state the chain will jump to next when it is time to jump. The sojourn times are inverted to obtain the transition rates. The values for these parameters have been derived from the literature, however, the model has been set up so that they can be easily changed if the physician thinks that a particular class of patients would be better represented by other values. The justifications for the default values for the parameters driving the model are provided in the comments in the code and in the references below.</p>
<div class="cell">
<details class="code-fold">
<summary>Show code for model inputs</summary>
<pre># == CLINICIAN INPUTS ==========================================================
#
# jump_P[i, j]  : probability of moving to state j *given* a transition occurs
#                 from state i.  Rows must sum to 1 for states 1-7.
#
# mean_sojourn  : average time (months) in each transient state.
#
# Together these fully specify the CTMC generator:
#   Q[i, j] = jump_P[i, j] / mean_sojourn[i]   (off-diagonal)
# All downstream simulation code derives Q_base automatically from these inputs.

# ── Jump chain transition probabilities ───────────────────────────────────────


jump_P &lt;- matrix(0, 8, 8,
  dimnames = list(paste0(&quot;S&quot;, 1:8), paste0(&quot;S&quot;, 1:8)))

# S1 (Surgery) -&gt; {S3=no high-risk, S4=PORT, S5=POCRT, S8=peri-op death}
# ~20-30% high-risk features -&gt; PORT (Tassone et al. 2023, NCDB n=53,503; DOI:10.1002/ohn.205)
# ~10-15% ECE/pos margins -&gt; POCRT; peri-operative mortality ~1-2% (Nathan et al. 2025, NSQIP n=866; DOI:10.18203/issn.2454-5929.ijohns20252980)
# NB: S8 revised from 0.05 to 0.015 — the original 5% exceeded the literature range of 1-2%;
# freed mass (0.035) reallocated to S3 (residual no-adjuvant group), S4 and S5 unchanged.
jump_P[1, c(3, 4, 5, 8)] &lt;- c(0.585, 0.25, 0.15, 0.015)

# S2 (DefinitiveRT) -&gt; {S6=NED, S7=treatment failure, S8=death}
# Definitive RT alone locoregional control ~65% for T2N0M0 (Dana-Farber Group 2011 PMID 21531515, 2-yr LRC 64%; Studer et al. 2007 T2-4 LC ~50-60%)
jump_P[2, c(6, 7, 8)] &lt;- c(0.65, 0.28, 0.07)

# S3 (PostOpSurveillance) -&gt; {S6=NED, S7=LR recurrence, S8=death}
# 5-yr LR recurrence ~12-18% after margin-negative R0 surgery alone (Ord 2006 15.5%; Luryi 2014 NCDB early-stage series)
jump_P[3, c(6, 7, 8)] &lt;- c(0.74, 0.18, 0.08)

# S4 (PORT) -&gt; {S6=NED, S7=LR recurrence, S8=death}
# After PORT, 3-yr OS ~71% (95% CI 67-75%) in adjuvant PORT cohort (Hosni et al. 2019, PMID: 30244160)
jump_P[4, c(6, 7, 8)] &lt;- c(0.67, 0.23, 0.10)

# S5 (POCRT) -&gt; {S6=NED, S7=LR recurrence, S8=death}
# After POCRT, locoregional control ~60%; Stage II mortality revised down to 10% (Bernier 2004, Cooper 2004; Stage II subgroup)
jump_P[5, c(6, 7, 8)] &lt;- c(0.60, 0.30, 0.10)

# S6 (NED) -&gt; {S7=LR recurrence, S8=death}
# ~75% leave NED via recurrence vs other-cause death (SEER Stage II split)
jump_P[6, c(7, 8)] &lt;- c(0.75, 0.25)

# S7 (LR Recurrence) -&gt; {S6=salvage success, S8=death}
# Salvage success 0.43 / mortality 0.57: Lee et al. 2024 pooled 5-yr OS = 43% (n=2069); revised from c(0.25, 0.75) — previous value over-estimated post-recurrence mortality
jump_P[7, c(6, 8)] &lt;- c(0.43, 0.57)

# ── Mean sojourn times (months) ───────────────────────────────────────────────
mean_sojourn &lt;- c(
  1.5,   # S1  Surgery: weighted mean accounting for real-world S-PORT delays ~8-9 wks (Correia 2026; Dayan 2023; Graboyes 2017)
  3.0,   # S2  DefinitiveRT: RT course (1.5 mo) + post-RT response assessment window &gt;=12 wks (NCCN HNC Guidelines; Mehanna et al. 2016 [PET-NECK] PMID 26958921)
  22.0,  # S3  PostOpSurveillance: weighted mean to event (NED@24mo x0.74 + recur@18mo x0.18 + death@14mo x0.08) = 22 mo (Blatt 2022; Brands 2019; Ord 2006)
  3.0,   # S4  PORT: ~6-week adjuvant RT + recovery — supported, unchanged
  4.0,   # S5  POCRT: ~6-week concurrent chemoRT + extended recovery — supported, unchanged
  120.0, # S6  NED: calibrated to SEER ~20% 5-yr recurrence; exit rate 0.0083/mo -&gt; mean 120 mo (SEER localized OCC; Brands 2019; Luryi 2014)
  12.5,  # S7  LR Recurrence: median OS ~12-18 mo all-comers (Liu 2007; Contrera 2022) — within supported range, unchanged
  NA     # S8  Death (absorbing)
)
names(mean_sojourn) &lt;- paste0(&quot;S&quot;, 1:8)

# == END CLINICIAN INPUTS ======================================================

# ── Derive Q_base from clinician inputs ───────────────────────────────────────
# Q[i,j] = jump_P[i,j] / mean_sojourn[i]
Q_base &lt;- matrix(0, 8, 8)
for (i in 1:7) {
  for (j in which(jump_P[i, ] &gt; 0)) {
    Q_base[i, j] &lt;- jump_P[i, j] / mean_sojourn[i]
  }
}
diag(Q_base) &lt;- -rowSums(Q_base)</pre>
</details>
</div>
<div class="columns">
<div class="column" style="width:45%;">
<section id="jump-chain-transition-probabilities" class="level5">
<h5 class="anchored" data-anchor-id="jump-chain-transition-probabilities">Jump Chain Transition Probabilities</h5>
<div class="cell">
<details class="code-fold">
<summary>Show code</summary>
<pre>as.data.frame(jump_P)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>   S1 S2    S3   S4   S5   S6   S7    S8
S1  0  0 0.585 0.25 0.15 0.00 0.00 0.015
S2  0  0 0.000 0.00 0.00 0.65 0.28 0.070
S3  0  0 0.000 0.00 0.00 0.74 0.18 0.080
S4  0  0 0.000 0.00 0.00 0.67 0.23 0.100
S5  0  0 0.000 0.00 0.00 0.60 0.30 0.100
S6  0  0 0.000 0.00 0.00 0.00 0.75 0.250
S7  0  0 0.000 0.00 0.00 0.43 0.00 0.570
S8  0  0 0.000 0.00 0.00 0.00 0.00 0.000</pre>
</div>
</div>
</section>
</div><div class="column" style="width:10%;">

</div><div class="column" style="width:45%;">
<section id="mean-sojourn-times-and-implied-rates" class="level5">
<h5 class="anchored" data-anchor-id="mean-sojourn-times-and-implied-rates">Mean Sojourn Times and Implied Rates</h5>
<div class="cell">
<details class="code-fold">
<summary>Show code</summary>
<pre>data.frame(
  #State        = paste0(&quot;S&quot;, 1:7),
  Label        = state_labels_ii[1:7],
  Mean_sojourn = mean_sojourn[1:7],
  Total_rate   = round(-diag(Q_base)[1:7], 4)
)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>                   Label Mean_sojourn Total_rate
S1               Surgery          1.5     0.6667
S2          DefinitiveRT          3.0     0.3333
S3    PostOpSurveillance         22.0     0.0455
S4       AdjuvantRT_PORT          3.0     0.3333
S5 AdjuvantChemoRT_POCRT          4.0     0.2500
S6                   NED        120.0     0.0083
S7         LR_Recurrence         12.5     0.0800</pre>
</div>
</div>
</section>
</div>
</div>
</section>
<section id="synthetic-data" class="level3">
<h3 class="anchored" data-anchor-id="synthetic-data">Synthetic Data</h3>
<p>This code block constructs the synthetic data and prints out the first few rows of the synthetic data set.</p>
<div class="cell">
<details class="code-fold">
<summary>show code</summary>
<pre># ── Patient-specific Q (covariates scale recurrence & mortality rates) ─────────
make_Q &lt;- function(age_i, size_i) {
  Q &lt;- Q_base
  age_eff  &lt;- exp(0.025 * (age_i  - 60))   # older  -&gt; higher mortality
  size_eff &lt;- exp(0.180 * (size_i -  3.0))  # larger -&gt; higher recurrence
  Q[3, 7] &lt;- Q_base[3, 7] * size_eff        # PostOpSurv -&gt; LR Recurrence
  Q[6, 7] &lt;- Q_base[6, 7] * size_eff * age_eff
  Q[6, 8] &lt;- Q_base[6, 8] * age_eff
  Q[7, 8] &lt;- Q_base[7, 8] * age_eff
  diag(Q) &lt;- 0
  diag(Q) &lt;- -rowSums(Q)
  Q
}

# ── Simulate one patient's full CTMC path ─────────────────────────────────────
sim_ctmc &lt;- function(init_s, Q_i, max_t) {
  t_vec &lt;- 0; s_vec &lt;- init_s; s &lt;- init_s; t &lt;- 0
  while (t &lt; max_t &#038;&#038; s != 8L) {
    total_rate &lt;- -Q_i[s, s]
    if (total_rate == 0) break
    t &lt;- t + rexp(1L, total_rate)
    if (t &gt; max_t) break
    probs    &lt;- pmax(Q_i[s, ], 0)
    probs[s] &lt;- 0
    s        &lt;- sample.int(8L, 1L, prob = probs)
    t_vec    &lt;- c(t_vec, t); s_vec &lt;- c(s_vec, s)
  }
  list(times = t_vec, states = s_vec)
}

# State at a given observation time (last known state)
state_at &lt;- function(traj, obs_t) {
  traj$states[max(which(traj$times &lt;= obs_t))]
}

# ── Generate panel data ────────────────────────────────────────────────────────
records &lt;- vector(&quot;list&quot;, N_PATIENTS)

for (i in seq_len(N_PATIENTS)) {
  init_s  &lt;- if (treatment[i] == &quot;Surgery&quot;) 1L else 2L
  Q_i     &lt;- make_Q(age[i], tumor_size_cm[i])
  traj    &lt;- sim_ctmc(init_s, Q_i, MAX_TIME)

  jitter  &lt;- c(0, runif(length(obs_base) - 1L, -0.5, 0.5))
  obs_t   &lt;- sort(unique(pmax(0, obs_base + jitter)))

  death_t &lt;- if (8L %in% traj$states)
    traj$times[which(traj$states == 8L)[1L]]
  else Inf

  obs_t &lt;- obs_t[obs_t &lt; death_t]
  if (is.finite(death_t) &#038;&#038; death_t &lt;= MAX_TIME)
    obs_t &lt;- c(obs_t, death_t)

  obs_s &lt;- sapply(obs_t, state_at, traj = traj)

  records[[i]] &lt;- data.frame(
    id            = i,
    time          = round(obs_t, 2),
    state         = obs_s,
    age           = age[i],
    tumor_size_cm = tumor_size_cm[i],
    DOI_mm        = DOI_mm[i],
    treatment     = treatment[i],
    stringsAsFactors = FALSE
  )
}

oscc2_data &lt;- do.call(rbind, records)
rownames(oscc2_data) &lt;- NULL
oscc2_data$state_label &lt;- state_labels_ii[oscc2_data$state]

# Save synthetic data
write.csv(oscc2_data, &quot;OSCC_HE_Synthetic_Data.csv&quot;, row.names = FALSE)

# First 10 rows
head(oscc2_data, 10)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>   id  time state age tumor_size_cm DOI_mm treatment        state_label
1   1  0.00     1  59           2.2    7.8   Surgery            Surgery
2   1  1.38     1  59           2.2    7.8   Surgery            Surgery
3   1  1.82     1  59           2.2    7.8   Surgery            Surgery
4   1  3.01     1  59           2.2    7.8   Surgery            Surgery
5   1  6.04     3  59           2.2    7.8   Surgery PostOpSurveillance
6   1  8.50     3  59           2.2    7.8   Surgery PostOpSurveillance
7   1 11.61     3  59           2.2    7.8   Surgery PostOpSurveillance
8   1 17.91     3  59           2.2    7.8   Surgery PostOpSurveillance
9   1 24.01     3  59           2.2    7.8   Surgery PostOpSurveillance
10  1 35.86     6  59           2.2    7.8   Surgery                NED</pre>
</div>
</div>
<p>Note that two covariates: age of patient and tumor size are included in the data ane encoded in the <code>make_Q</code> function used to create the synthetic data as follows:</p>
<ul>
<li><code>age_eff  &lt;- exp(0.025 * (age_i  - 60))   # older  -&gt; higher mortality</code></li>
<li><code>size_eff &lt;- exp(0.180 * (size_i -  3.0))  # larger -&gt; higher recurrence</code></li>
</ul>
<hr>
</section>
<section id="summary-of-data-set" class="level3">
<h3 class="anchored" data-anchor-id="summary-of-data-set">Summary of Data Set</h3>
<p>This table shows some of the meta-data describing the synthetic data set.</p>
<div class="cell">
<details class="code-fold">
<summary>Show code for cohort summary</summary>
<pre># Collect components for footnotes
trt_counts &lt;- table(treatment)
died &lt;- oscc2_data |&gt;
  dplyr::filter(state == 8) |&gt;
  dplyr::distinct(id) |&gt;
  nrow()

# Main table: state visit counts
as.data.frame(table(State = oscc2_data$state_label)) |&gt;
  gt() |&gt;
  tab_options(table.font.size = px(12)) |&gt;
  cols_label(State = &quot;State&quot;, Freq = &quot;Observations&quot;) |&gt;
  tab_source_note(md(paste0(
    &quot;**Dataset:** &quot;, nrow(oscc2_data), &quot; observations; &quot;,
    N_PATIENTS, &quot; patients&quot;
  ))) |&gt;
  tab_source_note(md(paste0(
    &quot;**Treatment:** Surgery = &quot;, trt_counts[&quot;Surgery&quot;],
    &quot;; DefinitiveRT = &quot;, trt_counts[&quot;DefinitiveRT&quot;]
  ))) |&gt;
  tab_source_note(md(paste0(
    &quot;**Mortality:** &quot;, died, &quot; / &quot;, N_PATIENTS,
    &quot; patients reached Death (&quot;,
    round(100 * died / N_PATIENTS, 1), &quot;%)&quot;
  )))</pre>
</details>
<div class="cell-output-display">
<div id="paewahtyoq" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#paewahtyoq table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#paewahtyoq thead, #paewahtyoq tbody, #paewahtyoq tfoot, #paewahtyoq tr, #paewahtyoq td, #paewahtyoq th {
  border-style: none;
}

#paewahtyoq p {
  margin: 0;
  padding: 0;
}

#paewahtyoq .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 12px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#paewahtyoq .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#paewahtyoq .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#paewahtyoq .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#paewahtyoq .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#paewahtyoq .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#paewahtyoq .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#paewahtyoq .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#paewahtyoq .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#paewahtyoq .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#paewahtyoq .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#paewahtyoq .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#paewahtyoq .gt_spanner_row {
  border-bottom-style: hidden;
}

#paewahtyoq .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#paewahtyoq .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#paewahtyoq .gt_from_md > :first-child {
  margin-top: 0;
}

#paewahtyoq .gt_from_md > :last-child {
  margin-bottom: 0;
}

#paewahtyoq .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#paewahtyoq .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#paewahtyoq .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#paewahtyoq .gt_row_group_first td {
  border-top-width: 2px;
}

#paewahtyoq .gt_row_group_first th {
  border-top-width: 2px;
}

#paewahtyoq .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#paewahtyoq .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#paewahtyoq .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#paewahtyoq .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#paewahtyoq .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#paewahtyoq .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#paewahtyoq .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#paewahtyoq .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#paewahtyoq .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#paewahtyoq .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#paewahtyoq .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#paewahtyoq .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#paewahtyoq .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#paewahtyoq .gt_left {
  text-align: left;
}

#paewahtyoq .gt_center {
  text-align: center;
}

#paewahtyoq .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#paewahtyoq .gt_font_normal {
  font-weight: normal;
}

#paewahtyoq .gt_font_bold {
  font-weight: bold;
}

#paewahtyoq .gt_font_italic {
  font-style: italic;
}

#paewahtyoq .gt_super {
  font-size: 65%;
}

#paewahtyoq .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#paewahtyoq .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#paewahtyoq .gt_indent_1 {
  text-indent: 5px;
}

#paewahtyoq .gt_indent_2 {
  text-indent: 10px;
}

#paewahtyoq .gt_indent_3 {
  text-indent: 15px;
}

#paewahtyoq .gt_indent_4 {
  text-indent: 20px;
}

#paewahtyoq .gt_indent_5 {
  text-indent: 25px;
}

#paewahtyoq .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#paewahtyoq div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="State" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col">State</th>
<th id="Freq" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">Observations</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_center" headers="State">AdjuvantChemoRT_POCRT</td>
<td class="gt_row gt_right" headers="Freq">228</td>
</tr>
<tr class="even">
<td class="gt_row gt_center" headers="State">AdjuvantRT_PORT</td>
<td class="gt_row gt_right" headers="Freq">333</td>
</tr>
<tr class="odd">
<td class="gt_row gt_center" headers="State">Death</td>
<td class="gt_row gt_right" headers="Freq">370</td>
</tr>
<tr class="even">
<td class="gt_row gt_center" headers="State">DefinitiveRT</td>
<td class="gt_row gt_right" headers="Freq">570</td>
</tr>
<tr class="odd">
<td class="gt_row gt_center" headers="State">LR_Recurrence</td>
<td class="gt_row gt_right" headers="Freq">767</td>
</tr>
<tr class="even">
<td class="gt_row gt_center" headers="State">NED</td>
<td class="gt_row gt_right" headers="Freq">4407</td>
</tr>
<tr class="odd">
<td class="gt_row gt_center" headers="State">PostOpSurveillance</td>
<td class="gt_row gt_right" headers="Freq">2381</td>
</tr>
<tr class="even">
<td class="gt_row gt_center" headers="State">Surgery</td>
<td class="gt_row gt_right" headers="Freq">1578</td>
</tr>
</tbody><tfoot>
<tr class="gt_sourcenotes odd">
<td colspan="2" class="gt_sourcenote"><strong>Dataset:</strong> 10634 observations; 1000 patients</td>
</tr>
<tr class="gt_sourcenotes even">
<td colspan="2" class="gt_sourcenote"><strong>Treatment:</strong> Surgery = 794; DefinitiveRT = 206</td>
</tr>
<tr class="gt_sourcenotes odd">
<td colspan="2" class="gt_sourcenote"><strong>Mortality:</strong> 370 / 1000 patients reached Death (37%)</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
</section>
<section id="first-patient-trajectory" class="level3">
<h3 class="anchored" data-anchor-id="first-patient-trajectory">First Patient Trajectory</h3>
<p>This data frame shows the complete trajectory of sates visited for the first synthetic patient.</p>
<div class="cell">
<details class="code-fold">
<summary>Show code for trajectory</summary>
<pre>oscc2_data[oscc2_data$id == 1,
           c(&quot;id&quot;, &quot;time&quot;, &quot;state&quot;, &quot;state_label&quot;, &quot;treatment&quot;,
             &quot;age&quot;, &quot;tumor_size_cm&quot;, &quot;DOI_mm&quot;)]</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>   id  time state        state_label treatment age tumor_size_cm DOI_mm
1   1  0.00     1            Surgery   Surgery  59           2.2    7.8
2   1  1.38     1            Surgery   Surgery  59           2.2    7.8
3   1  1.82     1            Surgery   Surgery  59           2.2    7.8
4   1  3.01     1            Surgery   Surgery  59           2.2    7.8
5   1  6.04     3 PostOpSurveillance   Surgery  59           2.2    7.8
6   1  8.50     3 PostOpSurveillance   Surgery  59           2.2    7.8
7   1 11.61     3 PostOpSurveillance   Surgery  59           2.2    7.8
8   1 17.91     3 PostOpSurveillance   Surgery  59           2.2    7.8
9   1 24.01     3 PostOpSurveillance   Surgery  59           2.2    7.8
10  1 35.86     6                NED   Surgery  59           2.2    7.8
11  1 47.91     6                NED   Surgery  59           2.2    7.8
12  1 59.89     6                NED   Surgery  59           2.2    7.8</pre>
</div>
</div>
<hr>
</section>
</section>
<section id="survival-analysis" class="level2">
<h2 class="anchored" data-anchor-id="survival-analysis">Survival Analysis</h2>
<section id="overall-survival" class="level3">
<h3 class="anchored" data-anchor-id="overall-survival">Overall Survival</h3>
<p>This next plot shows overall survival (Kaplan-Meier) curves for patients in both treatment arms. Note that surgery exhibits a slightly better overall survival curve (Approximately 73% vs approximately 65% at 5 years). However, both curves show very good five year survival probabilities.</p>
<p>Three external benchmark points are overlaid on the Kaplan–Meier curves as filled diamonds.</p>
<ul>
<li>Siegel et al. (2024) report approximately 84% five-year <em>relative</em> survival for localised oral cavity cancer in SEER (diagnoses 2013–2019); adjusting for background mortality at a median age of approximately 62 years using the SSA 2021 Period Life Table yields an estimated absolute five-year OS of ~77%, shown here against the Surgery arm curve. See the 5 year adjustment note below.</li>
<li>Hosni et al. (2019) observed a three-year OS of 71% (95% CI 67–75%) in 601 OSCC patients receiving adjuvant PORT at Princess Margaret Cancer Centre, also plotted against the Surgery arm.</li>
<li>Sher et al. (2011) reported a two-year OS of 63% in 12 oral cavity patients treated with definitive IMRT at Dana-Farber Cancer Institute, shown against the Definitive RT arm.</li>
</ul>
<p>While not validating the survival curves, the benchmark points indicate that they are a plausible starting point for analysis.</p>
<div class="cell">
<details class="code-fold">
<summary>Kaplan-Meier survival curves</summary>
<pre>library(survival)
library(survminer)

# One row per patient: time to death or end of follow-up
km_input &lt;- oscc2_data |&gt;
  dplyr::group_by(id, treatment) |&gt;
  dplyr::summarise(
    time  = max(time),
    event = as.integer(any(state == 8)),
    .groups = &quot;drop&quot;
  )

km_fit_2arm &lt;- survfit(Surv(time, event) ~ treatment, data = km_input)

km_plot &lt;- ggsurvplot(
  km_fit_2arm,
  data        = km_input,
  conf.int    = FALSE,
  risk.table  = TRUE,
  palette     = c(&quot;#D55E00&quot;, &quot;#0072B2&quot;),   # DefinitiveRT=orange, Surgery=blue (alphabetical strata order)
  legend.labs = c(&quot;Definitive RT&quot;, &quot;Surgery&quot;),
  xlab        = &quot;Time (months)&quot;,
  ylab        = &quot;Survival probability&quot;,
  title       = &quot;Overall Survival — Stage II OSCC&quot;,
  ylim        = c(0.25, 1),
  ggtheme     = theme_bw()
)

# Overlay external benchmark points (diamonds) on the KM plot
km_plot$plot &lt;- km_plot$plot +
  annotate(&quot;point&quot;, x = 60, y = 0.77, shape = 23, size = 4,
           fill = &quot;#0072B2&quot;, color = &quot;black&quot;) +
  annotate(&quot;text&quot;,  x = 60, y = 0.77,
           label = &quot;Siegel 2024\n(SEER ~77%)&quot;, vjust = -0.35, hjust = 0.5, size = 2.8) +
  annotate(&quot;point&quot;, x = 36, y = 0.71, shape = 23, size = 4,
           fill = &quot;#0072B2&quot;, color = &quot;black&quot;) +
  annotate(&quot;text&quot;,  x = 36, y = 0.71,
           label = &quot;Hosni 2019\n(71% at 3 yr)&quot;, vjust = -0.35, hjust = 0.5, size = 2.8) +
  annotate(&quot;point&quot;, x = 24, y = 0.63, shape = 23, size = 4,
           fill = &quot;#D55E00&quot;, color = &quot;black&quot;) +
  annotate(&quot;text&quot;,  x = 24, y = 0.63,
           label = &quot;Sher 2011\n(63% at 2 yr)&quot;, vjust = 1.6, hjust = 0.5, size = 2.8)

km_plot</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i2.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-10-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-2" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-10-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
</section>
<section id="survival-by-age-and-tumour-size" class="level3">
<h3 class="anchored" data-anchor-id="survival-by-age-and-tumour-size">Survival by Age and Tumour Size</h3>
<p>This next panel of plots show the overall survival broken out by the covariates age and tumor size. Each panel indicates the four possible values of the covariates.</p>
<div class="cell">
<details class="code-fold">
<summary>KM curves stratified by age × tumour size</summary>
<pre>library(broom)
library(ggplot2)
library(dplyr)

# ── One row per patient with covariate categories ─────────────────────────────
# Age split at 62 (simulation mean); tumour size split at 3 cm (Stage II midpoint)
km_cov &lt;- oscc2_data |&gt;
  dplyr::group_by(id, treatment) |&gt;
  dplyr::summarise(
    time          = max(time),
    event         = as.integer(any(state == 8)),
    age           = first(age),
    tumor_size_cm = first(tumor_size_cm),
    .groups       = &quot;drop&quot;
  ) |&gt;
  dplyr::mutate(
    age_grp  = factor(ifelse(age  &lt; 62,  &quot;Age &lt; 62&quot;,   &quot;Age ≥ 62&quot;),
                      levels = c(&quot;Age &lt; 62&quot;, &quot;Age ≥ 62&quot;)),
    size_grp = factor(ifelse(tumor_size_cm &lt; 3.0, &quot;Tumour &lt; 3 cm&quot;, &quot;Tumour ≥ 3 cm&quot;),
                      levels = c(&quot;Tumour &lt; 3 cm&quot;, &quot;Tumour ≥ 3 cm&quot;)),
    covariate_group = factor(
      interaction(age_grp, size_grp, sep = &quot;, &quot;),
      levels = c(
        &quot;Age &lt; 62, Tumour &lt; 3 cm&quot;,
        &quot;Age &lt; 62, Tumour ≥ 3 cm&quot;,
        &quot;Age ≥ 62, Tumour &lt; 3 cm&quot;,
        &quot;Age ≥ 62, Tumour ≥ 3 cm&quot;
      )
    )
  )

pal4  &lt;- c(&quot;#2166AC&quot;, &quot;#92C5DE&quot;, &quot;#D6604D&quot;, &quot;#B2182B&quot;)
labs4 &lt;- c(&quot;Age &lt; 62, Tumour &lt; 3 cm&quot;,
           &quot;Age &lt; 62, Tumour ≥ 3 cm&quot;,
           &quot;Age ≥ 62, Tumour &lt; 3 cm&quot;,
           &quot;Age ≥ 62, Tumour ≥ 3 cm&quot;)

# ── Tidy survival estimates per arm ──────────────────────────────────────────
tidy_arm &lt;- function(arm, data) {
  df  &lt;- dplyr::filter(data, treatment == arm)
  fit &lt;- survfit(Surv(time, event) ~ covariate_group, data = df)
  broom::tidy(fit, conf.int = FALSE) |&gt;
    dplyr::mutate(
      treatment = arm,
      covariate_group = factor(
        sub(&quot;covariate_group=&quot;, &quot;&quot;, strata),
        levels = levels(data$covariate_group)
      )
    )
}

surv_tidy &lt;- dplyr::bind_rows(
  tidy_arm(&quot;Surgery&quot;,      km_cov),
  tidy_arm(&quot;DefinitiveRT&quot;, km_cov)
) |&gt;
  dplyr::mutate(treatment = factor(treatment,
    levels = c(&quot;Surgery&quot;, &quot;DefinitiveRT&quot;),
    labels = c(&quot;Surgery Arm&quot;, &quot;Definitive RT Arm&quot;)))

# ── Two-panel KM plot ─────────────────────────────────────────────────────────
ggplot(surv_tidy, aes(x = time, y = estimate,
                      colour = covariate_group,
                      group  = covariate_group)) +
  geom_step(linewidth = 0.8) +
  facet_wrap(~treatment, ncol = 2) +
  scale_colour_manual(values = pal4, labels = labs4) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                     limits = c(0, 1), breaks = seq(0, 1, 0.25)) +
  scale_x_continuous(limits = c(0, 60), breaks = seq(0, 60, 12)) +
  labs(
    x      = &quot;Time (months)&quot;,
    y      = &quot;Survival probability&quot;,
    colour = &quot;Age / Tumour Size&quot;,
    title  = &quot;Overall Survival by Age and Tumour Size — Stage II OSCC&quot;
  ) +
  theme_bw(base_size = 11) +
  theme(
    legend.position  = &quot;bottom&quot;,
    legend.direction = &quot;horizontal&quot;,
    legend.text      = element_text(size = 9),
    legend.title     = element_text(size = 10, face = &quot;bold&quot;),
    strip.text       = element_text(size = 11, face = &quot;bold&quot;),
    plot.title       = element_text(size = 12, face = &quot;bold&quot;)
  ) +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i2.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-11-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-3" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-11-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
<hr>
</section>
<section id="what-the-subgroup-curves-show" class="level3">
<h3 class="anchored" data-anchor-id="what-the-subgroup-curves-show">What the subgroup curves show:</h3>
<ul>
<li><p><strong>Surgery arm</strong> (n = 794): age is the dominant predictor of survival. The youngest, smallest-tumour subgroup (Age < 62, Tumour < 3 cm; dark blue) reaches 74% at 5 years, while the youngest with larger tumours (Age < 62, Tumour ≥ 3 cm; light blue) reaches 71% Among older patients (Age ≥ 62), those with smaller tumours (Tumour < 3 cm; orange) fare worst at 58%, while those with larger tumours (dark red) reach 51%. Even if these counter intuitive orderings are due to sampling variation and not a more fundamental underlying cause they may indicate a limitation of the this particular synthetic data set.</p></li>
<li><p><strong>Definitive RT arm</strong> (n = 206): the four subgroups together contain only 206 patients, yielding wide confidence bands and coarse staircase steps; point estimates (69%, 68%, 55%, 51% for the dark-blue, light-blue, orange, and dark-red subgroups respectively) should be interpreted with caution. The broad pattern mirrors the Surgery arm in that older age is associated with lower survival, but the small-n instability makes subgroup comparisons across treatment arms unreliable.</p></li>
</ul>
</section>
<section id="progression-free-survival" class="level3">
<h3 class="anchored" data-anchor-id="progression-free-survival">Progression-Free Survival</h3>
<p>Progression-free survival (PFS) is defined here as the time from treatment start to the first occurrence of either locoregional recurrence (S7) or death (S8), whichever comes first. Patients who reached neither endpoint by month 60 are censored at their last observed time.</p>
<div class="cell">
<details class="code-fold">
<summary>Progression-free survival — Kaplan-Meier with at-risk table</summary>
<pre>library(survival)
library(survminer)

# ── Build per-patient PFS endpoint ───────────────────────────────────────────
# Event = first entry into LR_Recurrence (state 7) or Death (state 8)
pfs_events &lt;- oscc2_data |&gt;
  dplyr::filter(state %in% c(7L, 8L)) |&gt;
  dplyr::group_by(id) |&gt;
  dplyr::slice_min(time, n = 1, with_ties = FALSE) |&gt;
  dplyr::ungroup() |&gt;
  dplyr::select(id, pfs_time = time, pfs_event_type = state_label)

last_obs &lt;- oscc2_data |&gt;
  dplyr::group_by(id) |&gt;
  dplyr::slice_max(time, n = 1, with_ties = FALSE) |&gt;
  dplyr::ungroup() |&gt;
  dplyr::select(id, last_time = time, treatment)

pfs_df &lt;- last_obs |&gt;
  dplyr::left_join(pfs_events, by = &quot;id&quot;) |&gt;
  dplyr::mutate(
    pfs_time  = dplyr::coalesce(pfs_time, last_time),
    pfs_event = as.integer(!is.na(pfs_event_type))
  )

# ── Fit KM curves ─────────────────────────────────────────────────────────────
km_pfs &lt;- survfit(Surv(pfs_time, pfs_event) ~ treatment, data = pfs_df)
km_pfs_named &lt;- km_pfs
names(km_pfs_named$strata) &lt;- c(&quot;Definitive RT&quot;, &quot;Surgery&quot;)

# ── Plot with at-risk table ───────────────────────────────────────────────────
ggsurvplot(
  km_pfs_named,
  data              = pfs_df,
  palette           = c(&quot;#D55E00&quot;, &quot;#0072B2&quot;),
  conf.int          = TRUE,
  conf.int.alpha    = 0.15,
  size              = 0.9,
  risk.table        = TRUE,
  risk.table.col    = &quot;strata&quot;,
  risk.table.height = 0.25,
  tables.theme      = theme_cleantable(),
  xlab              = &quot;Time (months)&quot;,
  ylab              = &quot;Progression-free probability&quot;,
  title             = &quot;Progression-Free Survival — Stage II OSCC&quot;,
  legend.title      = &quot;Treatment&quot;,
  legend.labs       = c(&quot;Definitive RT&quot;, &quot;Surgery&quot;),
  legend            = &quot;bottom&quot;,
  surv.median.line  = &quot;hv&quot;,
  break.time.by     = 12,
  xlim              = c(0, 60),
  ylim              = c(0, 1),
  ggtheme           = theme_minimal(base_size = 13),
  fontsize          = 4
)</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i1.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-13-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-4" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-13-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
<section id="pfs-at-key-timepoints" class="level4">
<h4 class="anchored" data-anchor-id="pfs-at-key-timepoints">PFS at key timepoints</h4>
<div class="cell">
<details class="code-fold">
<summary>PFS estimates at 12, 24, 36, 48, 60 months</summary>
<pre>tbl_fun &lt;- function(s) {
  data.frame(
    Treatment = as.character(s$strata),
    Time      = s$time,
    PFS       = round(s$surv,  3),
    Lower95   = round(s$lower, 3),
    Upper95   = round(s$upper, 3)
  )
}
tbl_fun(summary(km_pfs_named, times = c(12, 24, 36, 48, 60))) |&gt;
  knitr::kable(
    col.names = c(&quot;Treatment&quot;, &quot;Month&quot;, &quot;PFS&quot;, &quot;95% CI lower&quot;, &quot;95% CI upper&quot;),
    caption   = &quot;Kaplan-Meier PFS estimates at annual checkpoints&quot;
  )</pre>
</details>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<caption>Kaplan-Meier PFS estimates at annual checkpoints</caption>
<thead>
<tr class="header">
<th style="text-align: left;">Treatment</th>
<th style="text-align: right;">Month</th>
<th style="text-align: right;">PFS</th>
<th style="text-align: right;">95% CI lower</th>
<th style="text-align: right;">95% CI upper</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Definitive RT</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">0.631</td>
<td style="text-align: right;">0.569</td>
<td style="text-align: right;">0.701</td>
</tr>
<tr class="even">
<td style="text-align: left;">Definitive RT</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">0.573</td>
<td style="text-align: right;">0.509</td>
<td style="text-align: right;">0.645</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Definitive RT</td>
<td style="text-align: right;">36</td>
<td style="text-align: right;">0.544</td>
<td style="text-align: right;">0.480</td>
<td style="text-align: right;">0.616</td>
</tr>
<tr class="even">
<td style="text-align: left;">Definitive RT</td>
<td style="text-align: right;">48</td>
<td style="text-align: right;">0.519</td>
<td style="text-align: right;">0.455</td>
<td style="text-align: right;">0.592</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Definitive RT</td>
<td style="text-align: right;">60</td>
<td style="text-align: right;">0.449</td>
<td style="text-align: right;">0.386</td>
<td style="text-align: right;">0.523</td>
</tr>
<tr class="even">
<td style="text-align: left;">Surgery</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">0.783</td>
<td style="text-align: right;">0.755</td>
<td style="text-align: right;">0.813</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Surgery</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">0.683</td>
<td style="text-align: right;">0.651</td>
<td style="text-align: right;">0.716</td>
</tr>
<tr class="even">
<td style="text-align: left;">Surgery</td>
<td style="text-align: right;">36</td>
<td style="text-align: right;">0.617</td>
<td style="text-align: right;">0.584</td>
<td style="text-align: right;">0.652</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Surgery</td>
<td style="text-align: right;">48</td>
<td style="text-align: right;">0.564</td>
<td style="text-align: right;">0.531</td>
<td style="text-align: right;">0.600</td>
</tr>
<tr class="even">
<td style="text-align: left;">Surgery</td>
<td style="text-align: right;">60</td>
<td style="text-align: right;">0.511</td>
<td style="text-align: right;">0.477</td>
<td style="text-align: right;">0.548</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<section id="what-the-pfs-curves-show" class="level4">
<h4 class="anchored" data-anchor-id="what-the-pfs-curves-show">What the PFS curves show</h4>
<ul>
<li>Surgery achieves notably higher early PFS: approximately 78% at 12 months versus 63% for Definitive RT — a 15 percentage point difference driven by the higher probability of early locoregional treatment failure in the RT arm.</li>
<li>The gap narrows progressively over the 60-month horizon, with Surgery at 51% and Definitive RT at 45% at month 60.</li>
<li>The wider confidence band on the Definitive RT curve reflects the smaller sample size in that arm.</li>
</ul>
</section>
</section>
</section>
<section id="time-patients-spend-in-each-health-state" class="level2">
<h2 class="anchored" data-anchor-id="time-patients-spend-in-each-health-state">Time Patients Spend in Each Health State</h2>
<p>The total times that patients are expected to spend in each state over a 60-month horizon drive the economics analysis. The ability to make this calculation is the primary motivation for modeling the treatment regimen as a continuous time Markov chain.</p>
<div class="cell">
<details class="code-fold">
<summary>State occupancy function</summary>
<pre>library(expm)

# Adapted from plot_msm_state_occupancy() (Breast_Cancer_v3.qmd).
# Accepts Q_base directly instead of a fitted msm model, and accepts
# a start_state argument so the two treatment arms can be compared.
plot_state_occupancy &lt;- function(Q,
                                 start_state  = 1,
                                 tmax         = 60,
                                 tstep        = 0.5,
                                 state_labels = NULL,
                                 title        = &quot;State Occupancy Probabilities&quot;) {
  n_states    &lt;- nrow(Q)
  start_probs &lt;- rep(0, n_states)
  start_probs[start_state] &lt;- 1
  time &lt;- seq(0, tmax, by = tstep)

  occ &lt;- matrix(NA, nrow = length(time), ncol = n_states)
  colnames(occ) &lt;- paste0(&quot;S&quot;, seq_len(n_states))
  for (i in seq_along(time))
    occ[i, ] &lt;- start_probs %*% expm(Q * time[i])

  df_long &lt;- as.data.frame(occ) |&gt;
    dplyr::mutate(time = time) |&gt;
    tidyr::pivot_longer(-time, names_to = &quot;state&quot;, values_to = &quot;prob&quot;)

  if (!is.null(state_labels))
    df_long$state &lt;- factor(df_long$state,
      levels = paste0(&quot;S&quot;, seq_len(n_states)),
      labels = state_labels)

  ggplot(df_long, aes(x = time, y = prob, colour = state)) +
    geom_line(linewidth = 1.0) +
    labs(x = &quot;Time (months)&quot;, y = &quot;Probability&quot;,
         title = title, colour = &quot;State&quot;) +
    theme_bw(base_size = 12)
}

# Expected time in each state: trapezoid integral of P(t) over 0-60 months
compute_state_time &lt;- function(Q, start_state, tmax = 60, tstep = 0.5) {
  n_states    &lt;- nrow(Q)
  start_probs &lt;- rep(0, n_states)
  start_probs[start_state] &lt;- 1
  time &lt;- seq(0, tmax, by = tstep)
  occ  &lt;- matrix(NA, nrow = length(time), ncol = n_states)
  for (i in seq_along(time))
    occ[i, ] &lt;- start_probs %*% expm(Q * time[i])
  apply(occ, 2, function(p)
    sum(diff(time) * (p[-1] + p[-length(p)]) / 2))
}</pre>
</details>
</div>
<div class="cell">
<details class="code-fold">
<summary>Expected state time by arm</summary>
<pre>time_surg &lt;- compute_state_time(Q_base, start_state = 1)
time_def  &lt;- compute_state_time(Q_base, start_state = 2)

data.frame(
  State      = paste0(&quot;S&quot;, 1:8),
  Label      = state_labels_ii,
  Surgery_mo = round(time_surg, 2),
  DefRT_mo   = round(time_def,  2)
)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>  State                 Label Surgery_mo DefRT_mo
1    S1               Surgery       1.51     0.00
2    S2          DefinitiveRT       0.00     3.01
3    S3    PostOpSurveillance      11.96     0.00
4    S4       AdjuvantRT_PORT       0.75     0.00
5    S5 AdjuvantChemoRT_POCRT       0.60     0.00
6    S6                   NED      28.83    35.54
7    S7         LR_Recurrence       4.09     5.67
8    S8                 Death      12.26    15.78</pre>
</div>
</div>
<p>The state occupancy plots show the probability of being in each state as time progresses. For both plots, focusing on a particular time point gives a gives an estimate of a patient’s probable health state.</p>
<div class="columns">
<div class="column" style="width:45%;">
<section id="state-occupancy-surgery" class="level4">
<h4 class="anchored" data-anchor-id="state-occupancy-surgery">State Occupancy: Surgery</h4>
<div class="cell">
<details class="code-fold">
<summary>Surgery occupancy plot</summary>
<pre>plot_state_occupancy(
  Q            = Q_base,
  start_state  = 1,
  state_labels = state_labels_ii,
  title        = &quot;State Occupancy Probabilities — Surgery Arm&quot;
)</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i2.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-18-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-5" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-18-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
</section>
</div><div class="column" style="width:10%;">

</div><div class="column" style="width:45%;">
<section id="state-occupancy-definitive-rt" class="level4">
<h4 class="anchored" data-anchor-id="state-occupancy-definitive-rt">State Occupancy: Definitive RT</h4>
<div class="cell">
<details class="code-fold">
<summary>Definitive RT occupancy plot</summary>
<pre>plot_state_occupancy(
  Q            = Q_base,
  start_state  = 2,
  state_labels = state_labels_ii,
  title        = &quot;State Occupancy Probabilities — Definitive RT Arm&quot;
)</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i0.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-19-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-6" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-19-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
</section>
</div>
</div>
<hr>
</section>
<section id="health-economic-evaluation-patient-perspective" class="level2">
<h2 class="anchored" data-anchor-id="health-economic-evaluation-patient-perspective">Health Economic Evaluation — Patient Perspective</h2>
<p>The health Economics model presented uses QALYs calculated with with EQ-5D utilities. Although these utilities are abstractions they do provide a measure, however imperfect, of how patients treated for OSCC have rated the ensemble of experiences related to each possible health state.</p>
<p>The model does not to calculate financial costs for the following reasons:</p>
<ol type="1">
<li>In the United States, there is no practical way to estimate the final direct cost to patients who are covered by private insurance or Medicare. The U.S. health care system is set up so that the final cost of treatments borne by patients are only made known after the treatments have be received.</li>
<li>For patients with adequate insurance it is not likely cost would not be directly relevant to patient decisions.</li>
<li>For patients without insurance, the financial burden of treatment would dominate all other considerations.</li>
<li>From a patient’s point of view, there is merit in clarifying the quality of life to be expected after undergoing a particular treatment. For example, it is not difficult to imagine that elderly patients would choose palliative care over a course of chemotherapy.</li>
</ol>
<section id="quality-adjusted-life-years-qalys" class="level3">
<h3 class="anchored" data-anchor-id="quality-adjusted-life-years-qalys">Quality-Adjusted Life Years (QALYs)</h3>
<p>A Quality-Adjusted Life Year (QALY) combines the quantity and quality of life into a single metric. One QALY represents one year lived in perfect health; a year spent in a health state with diminished quality counts for less than one QALY in proportion to that state’s utility weight.</p>
<p>In this model, QALYs are calculated in three steps:</p>
<ol type="1">
<li><p><strong>Assign utility weights:</strong> Each of the eight health states is assigned an EQ-5D utility value between 0 (death) and 1 (perfect health), drawn from peer-reviewed literature on head-and-neck cancer patients.</p></li>
<li><p><strong>Estimate time in each state:</strong> The continuous-time Markov chain (CTMC) simulation provides the expected number of months the cohort spends in each state over a 60-month horizon, separately for the Surgery arm and the Definitive Radiation Therapy arm.</p></li>
<li><p><strong>Compute utility-weighted life years:</strong> For each state, the time (in months) is multiplied by its utility weight and the products are summed across all states. Dividing by 12 converts the result to years:</p></li>
</ol>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BTotal%20QALYs%7D%20=%20%5Cfrac%7B1%7D%7B12%7D%20%5Csum_%7Bi=1%7D%5E%7B8%7D%20%5Ctext%7Btime%7D_i%20%5Ctimes%20u_i%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Ctext%7Btime%7D_i"> is the expected months in state <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?u_i"> is its EQ-5D utility. The two treatment arms are evaluated on the same 60-month horizon, allowing a direct QALY comparison from the patient’s perspective.</p>
</section>
<section id="utility-values" class="level3">
<h3 class="anchored" data-anchor-id="utility-values">Utility Values</h3>
<p>Health state utility values map each model state to a preference-based quality-of-life weight on the 0–1 scale (0 = death, 1 = perfect health). The weights below are EQ-5D-based defaults drawn from the published literature. S6 (NED) carries the highest utility among living states and serves as the reference; a NED-normalized column (NED = 1.00) is included for sensitivity work.</p>
<p>The sources for the utility values are provided as comments in the code below. Complete references are provided in the Reference section at the end.</p>
<div class="cell">
<details class="code-fold">
<summary>Define and display utility weights</summary>
<pre># Health state utility values (EQ-5D scale, 0 = death, 1 = perfect health).
# S6 (NED) is the reference (highest) living state.
# Sources: 
# S1 de Almeida et al. 2014; Noel et al. 2015
# S2 Sprave et al. 2022 (during-RT EQ-5D ~0.83, n=366); Truong/RTOG 0522 2017 retained for S5
# S3 Govers et al. 2016
# S4 Sprave et al. 2022 (during-RT EQ-5D ~0.83, n=366); Govers et al. 2016
# S5 Truong et al. / RTOG 0522 2017 (primary); Sprave et al. 2022 adjuvant CRT data support ~0.83 but POCRT toxicity justifies retaining 0.775 — see inline note
# S6 Noel et al. 2015 (reference state)
# S7 Meregaglia & Cairns 2017 (systematic review confirming evidence gap; only patient EQ-5D for recurrence found is median 0.70, del Barco et al. 2016, palliative/metastatic context); 0.55 is modeller's assumption
# S8 Convention (absorbing state)
utility &lt;- c(
  S1 = 0.60,   # Surgery: acute peri-operative period
  S2 = 0.83,   # DefinitiveRT: during 6-7 week RT course (Sprave et al. 2022, mean EQ-5D at RT completion = 0.830, n=366)
  S3 = 0.72,   # PostOpSurveillance: recovery phase
  S4 = 0.83,   # PORT: adjuvant RT toxicity (Sprave et al. 2022, mean EQ-5D at RT completion = 0.830, n=366)
  S5 = 0.775, # POCRT: concurrent chemoRT (revised from 0.60). Primary source: Truong/RTOG 0522 3-month EQ-5D proxy (CIS arm 0.78, CET/CIS arm 0.77); end-of-treatment value collected but not reported in paper. Comparison: Sprave et al. (2022) adjuvant CRT cohort had baseline HI = 0.849 with no significant within-group change to RT completion, and CRT vs RT-alone did not differ at RT completion (p = 0.624); the adjuvant cohort data would support ~0.83. Value retained at 0.775 as a deliberate conservative estimate: POCRT involves high-dose cisplatin concurrent with post-surgical RT (higher toxicity than Sprave 2022 mixed RT cohort), and RTOG 0522 provides the only patient-reported utility from an actual CRT trial. A sensitivity analysis using 0.83 would narrow the Surgery vs. Def RT QALY difference by ~0.009 QALYs (0.598 months in S5).
  S6 = 0.82,   # NED: reference state (Noel 2015)
  S7 = 0.55,   # LR Recurrence: modeller's assumption (no directly reported mean EQ-5D for LR recurrence; Meregaglia &#038; Cairns 2017 confirms evidence gap; only available patient EQ-5D is median 0.70 from del Barco 2016 in palliative recurrent/metastatic HNC)
  S8 = 0.00    # Death: absorbing state
)

df_utility &lt;- data.frame(
  State        = paste0(&quot;S&quot;, 1:8),
  Label        = state_labels_ii,
  Primary_Source = c(
    &quot;de Almeida (2014); Noel (2015)&quot;,
    &quot;Sprave (2022)&quot;,
    &quot;Govers (2016)&quot;,
    &quot;Sprave (2022)&quot;,
    &quot;Truong / RTOG 0522 (2017)&quot;,
    &quot;Noel (2015)&quot;,
    &quot;Modeller's assumption&quot;,
    &quot;Convention&quot;
  ),
  Utility_EQ5D = utility,
  Utility_NED1 = round(utility / utility[&quot;S6&quot;], 3),
  QALY_Surgery = round(time_surg * utility / 12, 3),
  QALY_DefRT   = round(time_def  * utility / 12, 3),
  QALY_Diff    = round((time_surg - time_def) * utility / 12, 3)
)
rownames(df_utility) &lt;- NULL

df_utility |&gt;
  gt() |&gt;
  tab_options(table.font.size = px(10)) |&gt;
  cols_label(
    State        = &quot;State&quot;,
    Label        = &quot;Health State&quot;,
    Primary_Source = &quot;Primary Source&quot;,
    Utility_EQ5D = &quot;EQ-5D Utility&quot;,
    Utility_NED1 = &quot;NED-Normalised&quot;,
    QALY_Surgery = &quot;QALYs — Surgery&quot;,
    QALY_DefRT   = &quot;QALYs — Definitive RT&quot;,
    QALY_Diff    = &quot;Difference (Surg − RT)&quot;
  ) |&gt;
  tab_style(
    style = cell_text(weight = &quot;bold&quot;),
    locations = cells_body(rows = State == &quot;S6&quot;)
  ) |&gt;
  tab_style(
    style = cell_text(color = &quot;#27AE60&quot;, weight = &quot;bold&quot;),
    locations = cells_body(columns = QALY_Diff, rows = QALY_Diff &gt; 0)
  ) |&gt;
  tab_style(
    style = cell_text(color = &quot;#C0392B&quot;, weight = &quot;bold&quot;),
    locations = cells_body(columns = QALY_Diff, rows = QALY_Diff &lt; 0)
  ) |&gt;
  tab_spanner(
    label   = &quot;Utility&quot;,
    columns = c(Utility_EQ5D, Utility_NED1)
  ) |&gt;
  tab_spanner(
    label   = &quot;QALYs by Treatment Arm&quot;,
    columns = c(QALY_Surgery, QALY_DefRT, QALY_Diff)
  ) |&gt;
  grand_summary_rows(
    columns  = c(QALY_Surgery, QALY_DefRT, QALY_Diff),
    fns      = list(Total ~ sum(.)),
    fmt      = ~ fmt_number(., decimals = 3)
  ) |&gt;
  tab_footnote(
    footnote  = md(&quot;Modeller's assumption. No directly reported patient EQ-5D for LR recurrence exists; best available evidence is median 0.70 (del Barco et al. 2016, palliative/metastatic context). See Meregaglia & Cairns (2017) in References.&quot;),
    locations = cells_body(columns = Primary_Source, rows = State == &quot;S7&quot;)
  ) |&gt;
  tab_source_note(&quot;NED-Normalised: utility relative to S6 NED = 1.00. QALYs = utility-weighted months / 12 over a 60-month horizon. Difference = Surgery − Definitive RT; green = Surgery advantage, red = RT advantage.&quot;)</pre>
</details>
<div class="cell-output-display">
<div id="qrbcyoppds" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#qrbcyoppds table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#qrbcyoppds thead, #qrbcyoppds tbody, #qrbcyoppds tfoot, #qrbcyoppds tr, #qrbcyoppds td, #qrbcyoppds th {
  border-style: none;
}

#qrbcyoppds p {
  margin: 0;
  padding: 0;
}

#qrbcyoppds .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 10px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#qrbcyoppds .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#qrbcyoppds .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#qrbcyoppds .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#qrbcyoppds .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#qrbcyoppds .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#qrbcyoppds .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#qrbcyoppds .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#qrbcyoppds .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#qrbcyoppds .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#qrbcyoppds .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#qrbcyoppds .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#qrbcyoppds .gt_spanner_row {
  border-bottom-style: hidden;
}

#qrbcyoppds .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#qrbcyoppds .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#qrbcyoppds .gt_from_md > :first-child {
  margin-top: 0;
}

#qrbcyoppds .gt_from_md > :last-child {
  margin-bottom: 0;
}

#qrbcyoppds .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#qrbcyoppds .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#qrbcyoppds .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#qrbcyoppds .gt_row_group_first td {
  border-top-width: 2px;
}

#qrbcyoppds .gt_row_group_first th {
  border-top-width: 2px;
}

#qrbcyoppds .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#qrbcyoppds .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#qrbcyoppds .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#qrbcyoppds .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#qrbcyoppds .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#qrbcyoppds .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#qrbcyoppds .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#qrbcyoppds .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#qrbcyoppds .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#qrbcyoppds .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#qrbcyoppds .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#qrbcyoppds .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#qrbcyoppds .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#qrbcyoppds .gt_left {
  text-align: left;
}

#qrbcyoppds .gt_center {
  text-align: center;
}

#qrbcyoppds .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#qrbcyoppds .gt_font_normal {
  font-weight: normal;
}

#qrbcyoppds .gt_font_bold {
  font-weight: bold;
}

#qrbcyoppds .gt_font_italic {
  font-style: italic;
}

#qrbcyoppds .gt_super {
  font-size: 65%;
}

#qrbcyoppds .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#qrbcyoppds .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#qrbcyoppds .gt_indent_1 {
  text-indent: 5px;
}

#qrbcyoppds .gt_indent_2 {
  text-indent: 10px;
}

#qrbcyoppds .gt_indent_3 {
  text-indent: 15px;
}

#qrbcyoppds .gt_indent_4 {
  text-indent: 20px;
}

#qrbcyoppds .gt_indent_5 {
  text-indent: 25px;
}

#qrbcyoppds .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#qrbcyoppds div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<colgroup>
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
<col style="width: 11%">
</colgroup>
<thead>
<tr class="gt_col_headings gt_spanner_row header">
<th rowspan="2" id="a::stub" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"></th>
<th rowspan="2" id="State" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">State</th>
<th rowspan="2" id="Label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">Health State</th>
<th rowspan="2" id="Primary_Source" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">Primary Source</th>
<th colspan="2" id="Utility" class="gt_center gt_columns_top_border gt_column_spanner_outer" data-quarto-table-cell-role="th" scope="colgroup"><div class="gt_column_spanner">
Utility
</div></th>
<th colspan="3" id="QALYs by Treatment Arm" class="gt_center gt_columns_top_border gt_column_spanner_outer" data-quarto-table-cell-role="th" scope="colgroup"><div class="gt_column_spanner">
QALYs by Treatment Arm
</div></th>
</tr>
<tr class="gt_col_headings even">
<th id="Utility_EQ5D" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">EQ-5D Utility</th>
<th id="Utility_NED1" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">NED-Normalised</th>
<th id="QALY_Surgery" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">QALYs — Surgery</th>
<th id="QALY_DefRT" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">QALYs — Definitive RT</th>
<th id="QALY_Diff" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">Difference (Surg − RT)</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<th id="stub_1_1" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_1 State">S1</td>
<td class="gt_row gt_left" headers="stub_1_1 Label">Surgery</td>
<td class="gt_row gt_left" headers="stub_1_1 Primary_Source">de Almeida (2014); Noel (2015)</td>
<td class="gt_row gt_right" headers="stub_1_1 Utility_EQ5D">0.600</td>
<td class="gt_row gt_right" headers="stub_1_1 Utility_NED1">0.732</td>
<td class="gt_row gt_right" headers="stub_1_1 QALY_Surgery">0.076</td>
<td class="gt_row gt_right" headers="stub_1_1 QALY_DefRT">0.000</td>
<td class="gt_row gt_right" headers="stub_1_1 QALY_Diff" style="color: #27AE60; font-weight: bold">0.076</td>
</tr>
<tr class="even">
<th id="stub_1_2" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_2 State">S2</td>
<td class="gt_row gt_left" headers="stub_1_2 Label">DefinitiveRT</td>
<td class="gt_row gt_left" headers="stub_1_2 Primary_Source">Sprave (2022)</td>
<td class="gt_row gt_right" headers="stub_1_2 Utility_EQ5D">0.830</td>
<td class="gt_row gt_right" headers="stub_1_2 Utility_NED1">1.012</td>
<td class="gt_row gt_right" headers="stub_1_2 QALY_Surgery">0.000</td>
<td class="gt_row gt_right" headers="stub_1_2 QALY_DefRT">0.208</td>
<td class="gt_row gt_right" headers="stub_1_2 QALY_Diff" style="color: #C0392B; font-weight: bold">-0.208</td>
</tr>
<tr class="odd">
<th id="stub_1_3" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_3 State">S3</td>
<td class="gt_row gt_left" headers="stub_1_3 Label">PostOpSurveillance</td>
<td class="gt_row gt_left" headers="stub_1_3 Primary_Source">Govers (2016)</td>
<td class="gt_row gt_right" headers="stub_1_3 Utility_EQ5D">0.720</td>
<td class="gt_row gt_right" headers="stub_1_3 Utility_NED1">0.878</td>
<td class="gt_row gt_right" headers="stub_1_3 QALY_Surgery">0.718</td>
<td class="gt_row gt_right" headers="stub_1_3 QALY_DefRT">0.000</td>
<td class="gt_row gt_right" headers="stub_1_3 QALY_Diff" style="color: #27AE60; font-weight: bold">0.718</td>
</tr>
<tr class="even">
<th id="stub_1_4" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_4 State">S4</td>
<td class="gt_row gt_left" headers="stub_1_4 Label">AdjuvantRT_PORT</td>
<td class="gt_row gt_left" headers="stub_1_4 Primary_Source">Sprave (2022)</td>
<td class="gt_row gt_right" headers="stub_1_4 Utility_EQ5D">0.830</td>
<td class="gt_row gt_right" headers="stub_1_4 Utility_NED1">1.012</td>
<td class="gt_row gt_right" headers="stub_1_4 QALY_Surgery">0.052</td>
<td class="gt_row gt_right" headers="stub_1_4 QALY_DefRT">0.000</td>
<td class="gt_row gt_right" headers="stub_1_4 QALY_Diff" style="color: #27AE60; font-weight: bold">0.052</td>
</tr>
<tr class="odd">
<th id="stub_1_5" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_5 State">S5</td>
<td class="gt_row gt_left" headers="stub_1_5 Label">AdjuvantChemoRT_POCRT</td>
<td class="gt_row gt_left" headers="stub_1_5 Primary_Source">Truong / RTOG 0522 (2017)</td>
<td class="gt_row gt_right" headers="stub_1_5 Utility_EQ5D">0.775</td>
<td class="gt_row gt_right" headers="stub_1_5 Utility_NED1">0.945</td>
<td class="gt_row gt_right" headers="stub_1_5 QALY_Surgery">0.039</td>
<td class="gt_row gt_right" headers="stub_1_5 QALY_DefRT">0.000</td>
<td class="gt_row gt_right" headers="stub_1_5 QALY_Diff" style="color: #27AE60; font-weight: bold">0.039</td>
</tr>
<tr class="even">
<th id="stub_1_6" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_6 State" style="font-weight: bold">S6</td>
<td class="gt_row gt_left" headers="stub_1_6 Label" style="font-weight: bold">NED</td>
<td class="gt_row gt_left" headers="stub_1_6 Primary_Source" style="font-weight: bold">Noel (2015)</td>
<td class="gt_row gt_right" headers="stub_1_6 Utility_EQ5D" style="font-weight: bold">0.820</td>
<td class="gt_row gt_right" headers="stub_1_6 Utility_NED1" style="font-weight: bold">1.000</td>
<td class="gt_row gt_right" headers="stub_1_6 QALY_Surgery" style="font-weight: bold">1.970</td>
<td class="gt_row gt_right" headers="stub_1_6 QALY_DefRT" style="font-weight: bold">2.429</td>
<td class="gt_row gt_right" headers="stub_1_6 QALY_Diff" style="font-weight: bold; color: #C0392B">-0.459</td>
</tr>
<tr class="odd">
<th id="stub_1_7" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_7 State">S7</td>
<td class="gt_row gt_left" headers="stub_1_7 Label">LR_Recurrence</td>
<td class="gt_row gt_left" headers="stub_1_7 Primary_Source">Modeller's assumption<span class="gt_footnote_marks" style="white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;"><sup>1</sup></span></td>
<td class="gt_row gt_right" headers="stub_1_7 Utility_EQ5D">0.550</td>
<td class="gt_row gt_right" headers="stub_1_7 Utility_NED1">0.671</td>
<td class="gt_row gt_right" headers="stub_1_7 QALY_Surgery">0.188</td>
<td class="gt_row gt_right" headers="stub_1_7 QALY_DefRT">0.260</td>
<td class="gt_row gt_right" headers="stub_1_7 QALY_Diff" style="color: #C0392B; font-weight: bold">-0.072</td>
</tr>
<tr class="even">
<th id="stub_1_8" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row"></th>
<td class="gt_row gt_left" headers="stub_1_8 State">S8</td>
<td class="gt_row gt_left" headers="stub_1_8 Label">Death</td>
<td class="gt_row gt_left" headers="stub_1_8 Primary_Source">Convention</td>
<td class="gt_row gt_right" headers="stub_1_8 Utility_EQ5D">0.000</td>
<td class="gt_row gt_right" headers="stub_1_8 Utility_NED1">0.000</td>
<td class="gt_row gt_right" headers="stub_1_8 QALY_Surgery">0.000</td>
<td class="gt_row gt_right" headers="stub_1_8 QALY_DefRT">0.000</td>
<td class="gt_row gt_right" headers="stub_1_8 QALY_Diff">0.000</td>
</tr>
<tr class="odd">
<th id="grand_summary_stub_1" class="gt_row gt_left gt_stub gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" data-quarto-table-cell-role="th" scope="row">Total</th>
<td class="gt_row gt_left gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 State">—</td>
<td class="gt_row gt_left gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 Label">—</td>
<td class="gt_row gt_left gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 Primary_Source">—</td>
<td class="gt_row gt_right gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 Utility_EQ5D">—</td>
<td class="gt_row gt_right gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 Utility_NED1">—</td>
<td class="gt_row gt_right gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 QALY_Surgery">3.043</td>
<td class="gt_row gt_right gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 QALY_DefRT">2.897</td>
<td class="gt_row gt_right gt_grand_summary_row gt_first_grand_summary_row gt_last_summary_row" headers="grand_summary_stub_1 QALY_Diff">0.146</td>
</tr>
</tbody><tfoot>
<tr class="gt_footnotes odd">
<td colspan="9" class="gt_footnote"><span class="gt_footnote_marks" style="white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;"><sup>1</sup></span> Modeller’s assumption. No directly reported patient EQ-5D for LR recurrence exists; best available evidence is median 0.70 (del Barco et al. 2016, palliative/metastatic context). See Meregaglia & Cairns (2017) in References.</td>
</tr>
<tr class="gt_sourcenotes even">
<td colspan="9" class="gt_sourcenote">NED-Normalised: utility relative to S6 NED = 1.00. QALYs = utility-weighted months / 12 over a 60-month horizon. Difference = Surgery − Definitive RT; green = Surgery advantage, red = RT advantage.</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
</section>
<section id="qaly-comparison-by-treatment-arm" class="level3">
<h3 class="anchored" data-anchor-id="qaly-comparison-by-treatment-arm">QALY Comparison by Treatment Arm</h3>
<p>With utility weights and state occupancy times established, QALYs can be compared across the two treatment arms. The stacked bar chart below shows the utility-weighted life years accrued in each health state over the 60-month horizon. Each bar is broken down by health state, so the chart reveals not just the total QALY difference between arms but <em>where</em> that difference arises. It also reveals which states contribute most of the advantages of the treatment and which states are similar for both treatments.</p>
<div class="cell">
<details class="code-fold">
<summary>QALY comparison chart</summary>
<pre># Expected utility-weighted months per state per arm,
# using state occupancy integrals and EQ-5D utility weights.
# Total QALYs = sum(time_in_state_months * utility) / 12
qaly_surg &lt;- time_surg * utility
qaly_def  &lt;- time_def  * utility

state_palette &lt;- c(
  S1 = &quot;#E67E22&quot;, S2 = &quot;#8E44AD&quot;, S3 = &quot;#F39C12&quot;,
  S4 = &quot;#2980B9&quot;, S5 = &quot;#16A085&quot;, S6 = &quot;#27AE60&quot;, S7 = &quot;#C0392B&quot;
)

df_qaly &lt;- data.frame(
  State   = rep(paste0(&quot;S&quot;, 1:8), 2),
  Label   = rep(state_labels_ii, 2),
  Arm     = rep(c(&quot;Surgery&quot;, &quot;Definitive RT&quot;), each = 8),
  QALY_mo = c(qaly_surg, qaly_def)
) |&gt;
  dplyr::filter(State != &quot;S8&quot;) |&gt;
  dplyr::mutate(
    State = factor(State, levels = paste0(&quot;S&quot;, 1:7)),
    Arm   = factor(Arm,   levels = c(&quot;Surgery&quot;, &quot;Definitive RT&quot;))
  )

totals_label &lt;- data.frame(
  Arm   = factor(c(&quot;Surgery&quot;, &quot;Definitive RT&quot;),
                 levels = c(&quot;Surgery&quot;, &quot;Definitive RT&quot;)),
  total = c(sum(qaly_surg), sum(qaly_def)) / 12,
  label = paste0(round(c(sum(qaly_surg), sum(qaly_def)) / 12, 2), &quot; QALYs&quot;)
)

ggplot(df_qaly, aes(x = Arm, y = QALY_mo / 12, fill = State)) +
  geom_col(width = 0.55) +
  geom_text(
    data        = totals_label,
    aes(x = Arm, y = total + 0.06, label = label),
    inherit.aes = FALSE,
    fontface    = &quot;bold&quot;, size = 4
  ) +
  scale_fill_manual(
    values = state_palette,
    labels = setNames(state_labels_ii[1:7], paste0(&quot;S&quot;, 1:7))
  ) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.08)),
    breaks = seq(0, 3, 0.5)
  ) +
  labs(
    x        = NULL,
    y        = &quot;Quality-Adjusted Life Years (QALYs)&quot;,
    fill     = &quot;Health State&quot;,
    title    = &quot;Expected QALYs by Health State and Treatment Arm&quot;,
    subtitle = &quot;60-month horizon; utility weights from EQ-5D literature defaults&quot;,
    caption  = &quot;State S8 (Death, utility = 0) excluded; contributes 0 QALYs&quot;
  ) +
  theme_bw(base_size = 12)</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><a href="https://i1.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-23-1.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-7" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/oscc-patient-model/index_files/figure-html/unnamed-chunk-23-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></a></p>
</figure>
</div>
</div>
</div>
<section id="a-few-things-worth-noting-from-the-chart" class="level4">
<h4 class="anchored" data-anchor-id="a-few-things-worth-noting-from-the-chart">A few things worth noting from the chart:</h4>
<ul>
<li>Surgery accumulates 3.04 QALYs vs. 2.9 QALYs for Definitive RT over the 60-month horizon — a difference of ~0.14 QALYs.</li>
<li>The dominant contributor in both arms is NED (green), which is expected given its high utility (0.82) and long sojourn time.</li>
<li>Surgery’s advantage comes largely from the PostOpSurveillance (orange) segment, which has a better utility (0.72) than Definitive RT’s LR Recurrence (dark red) burden.</li>
<li>The LR Recurrence segment is visibly larger for the Definitive RT arm (0.26 QALYs vs. 0.19), consistent with the higher recurrence probability built into that arm’s jump chain.</li>
<li>These QALY estimates are model-based (from Q_base × utility) and do not yet incorporate cost.</li>
</ul>
</section>
</section>
</section>
<section id="discussion" class="level2">
<h2 class="anchored" data-anchor-id="discussion">Discussion</h2>
<p>The intended users of the class of models I am proposing are physicians and their support teams who are making treatment decisions and explaining the possible consequences of the treatments to their patients. In the first use case, physicians may find the formality of the model and its baseline estimates useful in finalizing their decisions and documenting their decision making process. Having a model to inform treatment decisions should be especially helpful in situations where multiple physicians collaborate on deciding treatment options. At the very least, a model would be useful in verifying shared assumptions.</p>
<p>The second use of the model to help physicians and their care teams communicate with patients about the consequences of various treatment options is more speculative. To be truly helpful to patients physicians and their care teams must generate a compelling narrative for each health state explaining the side effects of the treatments experienced in each health state along with the strategies that previous patients have employed to cope with them. At the very least the exercise of explaining a journey through various states and interpreting the QALY’s may be useful in structuring patient conversations and setting expectations.</p>
</section>
<section id="technical-notes" class="level2">
<h2 class="anchored" data-anchor-id="technical-notes">Technical Notes</h2>
<section id="five-year-adjustment-process" class="level3">
<h3 class="anchored" data-anchor-id="five-year-adjustment-process">Five year adjustment process</h3>
<p>Five-year background survival for the general US population at age 62, <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D_%7B62%7D">, is computed from the <a href="https://www.ssa.gov/oact/STATS/table4c6_2021_TR2024.html" rel="nofollow" target="_blank">2021 SSA Period Life Table</a> as the product of annual survival probabilities across ages 62 to 66:</p>
<p><img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D_%7B62%7D%20=%20%5Cprod_%7Bx=62%7D%5E%7B66%7D(1%20-%20q_x)"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?q_x"> is the one-year death probability at exact age <img src="https://latex.codecogs.com/png.latex?x"> (SSA notation). Using the 2021 table values for ages 62–66: <img src="https://latex.codecogs.com/png.latex?q_x%5E%7B%5Ctext%7Bmale%7D%7D"> = 0.01648, 0.01762, 0.01876, 0.01991, 0.02110 and <img src="https://latex.codecogs.com/png.latex?q_x%5E%7B%5Ctext%7Bfemale%7D%7D"> = 0.01014, 0.01085, 0.01155, 0.01222, 0.01295, giving <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D_%7B62%7D%20=%200.910"> (male) and <img src="https://latex.codecogs.com/png.latex?0.944"> (female). The sex-weighted background survival is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbar%7BS%7D%5E%7B(5)%7D%20=%20w_M%20%5Ccdot%20S%5E%7B(5)%7D_M%20+%20w_F%20%5Ccdot%20S%5E%7B(5)%7D_F%0A=%200.70%20%5Ctimes%200.910%20+%200.30%20%5Ctimes%200.944%20=%200.920"></p>
<p>where the 70:30 sex split is from SEER oral cavity incidence data. Absolute five-year OS is then obtained as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BOS%7D_%7B%5Ctext%7Babs%7D%7D%20=%20r%20%5Ctimes%20%5Cbar%7BS%7D%5E%7B(5)%7D%20=%200.84%20%5Ctimes%200.920%20=%200.773"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?r%20=%200.84"> is the SEER five-year relative survival for localized oral cavity cancer (Siegel et al. 2024, <em>CA Cancer J Clin</em>, Figure 5). The code below details this calculation and provides a sensitivity check at age 65 (note: age-65 values are pending verification).</p>
<div class="cell">
<details class="code-fold">
<summary>Siegel 2024: relative → absolute OS derivation</summary>
<pre># US Social Security Administration period life tables (2021 release, used for 2019 SEER cohort).
# 5-year survival probabilities by sex at age 62 (closest to OSCC median diagnosis age).
# Computed as prod(1 - q_x) for x = 62:66 from Table 4c6 (2021 Trustees Report cycle).
# Source: Social Security Administration. Period Life Table, 2021 (2024 Trustees Report).
# https://www.ssa.gov/oact/STATS/table4c6.html
surv_male_62   &lt;- 0.9096  # 5-yr background survival, males age 62;   SSA 2021 prod(1-qx), x=62:66
surv_female_62 &lt;- 0.9436  # 5-yr background survival, females age 62; SSA 2021 prod(1-qx), x=62:66

# OSCC sex distribution: ~70% male, ~30% female (SEER, oral cavity)
prop_male &lt;- 0.70
expected_survival_5yr &lt;- prop_male * surv_male_62 + (1 - prop_male) * surv_female_62

# SEER 5-year relative survival for localized oral cavity (Siegel 2024, Fig. 5)
relative_survival_5yr &lt;- 0.84

# Absolute OS = relative survival × expected (background) survival
absolute_os_5yr &lt;- relative_survival_5yr * expected_survival_5yr

# Sensitivity: 5-yr background survival at age 65 (upper end of median diagnosis age range).
# Source: Social Security Administration. Period Life Table, 2021 (2024 Trustees Report).
# Computed as prod(1 - q_x) for x = 65:69. https://www.ssa.gov/oact/STATS/table4c6.html
surv_male_65   &lt;- 0.8923  # 5-yr background survival, males age 65;   SSA 2021 prod(1-qx), x=65:69
surv_female_65 &lt;- 0.9320  # 5-yr background survival, females age 65; SSA 2021 prod(1-qx), x=65:69
expected_surv_65 &lt;- prop_male * surv_male_65 + (1 - prop_male) * surv_female_65
absolute_os_65   &lt;- relative_survival_5yr * expected_surv_65

data.frame(
  Assumption              = c(&quot;Age 62 (lower bound)&quot;, &quot;Age 65 (upper bound)&quot;),
  Background_5yr_survival = round(c(expected_survival_5yr, expected_surv_65), 3),
  Absolute_5yr_OS         = round(c(absolute_os_5yr, absolute_os_65), 3)
)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>            Assumption Background_5yr_survival Absolute_5yr_OS
1 Age 62 (lower bound)                   0.920           0.773
2 Age 65 (upper bound)                   0.904           0.760</pre>
</div>
<details class="code-fold">
<summary>Siegel 2024: relative → absolute OS derivation</summary>
<pre># Age 62: background survival 0.920 × relative survival 0.84 = 0.773 (~77%).
# Age 65 sensitivity: background survival 0.904 × relative survival 0.84 = 0.760 (~76%).</pre>
</details>
</div>
</section>
<section id="using-the-code-with-real-data" class="level3">
<h3 class="anchored" data-anchor-id="using-the-code-with-real-data">Using the code with Real Data</h3>
<p>There is no doubt that models based on real patient data would be more convincing. To this end, I have structured the format of the generated synthetic data file so that it is suitable for fitting a continuous time Markov chain using the <code>R</code> package <code>msm</code>. I have verified that <code>msm</code> is able to fit the eight state sample model from the synthetic data. It is reasonable to expect that models of similar complexity could be fit from real patient data.</p>
</section>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>The references in this section are neither complete nor definitive. The literature is enormous. They have been selected to support the default values for the user inputs.</p>
<section id="references-for-background-mortality" class="level3">
<h3 class="anchored" data-anchor-id="references-for-background-mortality">References for Background Mortality</h3>
<ul>
<li>Social Security Administration. <em>Period Life Table, 2021</em> (as used in the 2024 Trustees Report). Available at: <a href="https://www.ssa.gov/oact/STATS/table4c6.html" class="uri" rel="nofollow" target="_blank">https://www.ssa.gov/oact/STATS/table4c6.html</a>. Accessed 2026-03-18. Five-year background survival probabilities for the general US population at ages 62 and 65, computed as <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D%20=%20%5Cprod_%7Bx%7D%5E%7Bx+4%7D(1-q_x)"> where <img src="https://latex.codecogs.com/png.latex?q_x"> is the one-year death probability at exact age <img src="https://latex.codecogs.com/png.latex?x">. Values at age 62: male <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D%20=%200.910">, female <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D%20=%200.944">. Used to convert SEER relative survival (Siegel 2024) to absolute OS for the calibration benchmark. Values at age 65: male <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D%20=%200.892">, female <img src="https://latex.codecogs.com/png.latex?S%5E%7B(5)%7D%20=%200.932"> (computed from <img src="https://latex.codecogs.com/png.latex?q_%7B65%7D">–<img src="https://latex.codecogs.com/png.latex?q_%7B69%7D"> = 0.01991, 0.02110, 0.02242, 0.02385, 0.02536 male; 0.01222, 0.01295, 0.01384, 0.01486, 0.01603 female).</li>
</ul>
</section>
<section id="references-for-transition-probabilities-and-sojourn-times" class="level3">
<h3 class="anchored" data-anchor-id="references-for-transition-probabilities-and-sojourn-times">References for Transition Probabilities and Sojourn Times</h3>
<ul>
<li><p><a href="https://pubmed.ncbi.nlm.nih.gov/15128894/" rel="nofollow" target="_blank">Bernier et al. (2004)</a> “Postoperative Irradiation with or without Concomitant Chemotherapy for Locally Advanced Head and Neck Cancer”, <em>N Engl J Med.</em> 350:1945–1952. PMID: 15128894. Landmark EORTC 22931 RCT establishing concurrent chemoRT as standard of care for high-risk post-operative HNSCC. Supports jump_P[5,6] and the locoregional control rates in the POCRT arm (S5).</p></li>
<li><p><a href="https://doi.org/10.3390/jcm11237061" rel="nofollow" target="_blank">Blatt et al. (2022)</a> “Tumor Recurrence and Follow-Up Intervals in Oral Squamous Cell Carcinoma”, <em>J Clin Med.</em> 11(23):7061. PMID: 36498636. PMC: PMC9740063. DOI: 10.3390/jcm11237061. University Medical Centre Mainz, n = 760 OSCC patients. Supports S3 NED declaration timing ~24 months (mean recurrence interval 24 ± 26 months; 50% of recurrences by 24 months) and S3 mean sojourn revision to 22 months.</p></li>
<li><p><a href="https://onlinelibrary.wiley.com/doi/10.1002/cam4.2124" rel="nofollow" target="_blank">Brands et al. (2019)</a> “Time Patterns of Recurrence and Second Primary Tumors in a Large Cohort of Patients Treated for Oral Cavity Cancer”, <em>Cancer Med.</em> 8(12):5810–5819. DOI: 10.1002/cam4.2124. Retrospective cohort of 594 OSCC patients; the great majority of recurrences occur in the first 2 years; 5-year cumulative second-event risk ~30%. Supports S3 sojourn revision to 22 months and S6 NED calibration.</p></li>
<li><p><a href="https://pubmed.ncbi.nlm.nih.gov/36155359/" rel="nofollow" target="_blank">Contrera et al. (2022)</a> “Outcomes for Recurrent Oral Cavity Squamous Cell Carcinoma”, <em>Oral Oncol.</em> 134:106127. PMID: 36155359. DOI: 10.1016/j.oraloncology.2022.106127. MD Anderson, n = 259 salvage surgeries (1997-2011); 5-year OS 44.2% for surgical candidates vs. 1.5% for nonsurgical therapy; 51% second recurrence at median 17 months. Provides contextual support for the salvage surgery framework; primary source for current values jump_P[7,6] = 0.43 and jump_P[7,8] = 0.57 is Lee et al. (2024).</p></li>
<li><p><a href="https://www.nejm.org/doi/full/10.1056/NEJMoa032646" rel="nofollow" target="_blank">Cooper et al. (2004)</a> “Postoperative Concurrent Radiotherapy and Chemotherapy for High-Risk Squamous-Cell Carcinoma of the Head and Neck”, <em>N Engl J Med.</em> 350:1937–1944. PMID: 15084618. RTOG 9501 RCT; locoregional control ~60% at 5 years in the concurrent chemoRT arm. Supports S5 jump probabilities.</p></li>
<li><p><a href="https://doi.org/10.1002/lio2.70363" rel="nofollow" target="_blank">Correia et al. (2026)</a> “Timely Matters: Predictors of Delay in Oral Cavity Cancer Patients Across the Care Continuum”. <em>Laryngoscope Investig Otolaryngol.</em> 11(2):e70363. DOI: 10.1002/lio2.70363. (Also available via <a href="https://escholarship.org/uc/item/9680b1bw" rel="nofollow" target="_blank">eScholarship</a>. n = 93 OCSCC patients. Median surgery-to-adjuvant RT interval 8.4 weeks (same institution) and 9.3 weeks (different facility). Supports S1 mean sojourn revision to 1.5 months.</p></li>
<li><p><a href="https://doi.org/10.1016/j.oraloncology.2023.106622" rel="nofollow" target="_blank">Dayan et al. (2023)</a> “Predictors of prolonged treatment time intervals in oral cavity cancer”, <em>Oral Oncol.</em> 106622. DOI: 10.1016/j.oraloncology.2023.106622. CHUM Montreal, n = 145 multimodal OCSCC patients. Median surgery-to-PORT interval 64 days = 2.1 months. Supports S1 mean sojourn revision.</p></li>
<li><p><a href="https://onlinelibrary.wiley.com/doi/10.1097/00005537-200003001-00001" rel="nofollow" target="_blank">Goodwin (2000)</a> “Salvage Surgery for Patients with Recurrent Squamous Cell Carcinoma of the Upper Aerodigestive Tract: When Do the Ends Justify the Means?”, <em>Laryngoscope.</em> 110(suppl 93):1–18. Cited in v2 for salvage success rate 15–25%. Superseded in v3 by Lee et al. 2024 for jump_P[7,6]; retained for historical context.</p></li>
<li><p><a href="https://acsjournals.onlinelibrary.wiley.com/doi/10.1002/cncr.30651" rel="nofollow" target="_blank">Graboyes et al. (2017)</a> “Adherence to National Comprehensive Cancer Network Guidelines for Time to Initiation of Postoperative Radiation Therapy for Patients with Head and Neck Cancer”, <em>Cancer.</em> 123(14):2651–2660. DOI: 10.1002/cncr.30651. NCDB analysis, n = 47,273 HNSCC patients; over 50% failed to initiate PORT within the NCCN-recommended 6 weeks. Supports real-world S1 mean sojourn estimate of 1.5 months.</p></li>
<li><p><a href="https://doi.org/10.1016/j.ijrobp.2018.09.013" rel="nofollow" target="_blank">Hosni et al. (2019)</a> “Predictors of Early Recurrence Prior to Planned Postoperative Radiation Therapy for Oral Cavity Squamous Cell Carcinoma and Outcomes Following Salvage Intensified Radiation Therapy”, <em>Int J Radiat Oncol Biol Phys.</em> 103(2):363–373. PMID: 30244160. Princess Margaret Cancer Centre, n = 601 OSCC patients (2003–2015); 3-year OS ~71% (95% CI 67–75%) after adjuvant PORT. Supports S4 jump probabilities (jump_P[4, 6:8]).</p></li>
<li><p><a href="https://doi.org/10.1002/lary.27191" rel="nofollow" target="_blank">Katsoulakis et al. (2018)</a> “Long-term outcomes in oral cavity squamous cell carcinoma with adjuvant and salvage radiotherapy after surgery”, <em>Laryngoscope.</em> PMID: 29637571. DOI: 10.1002/lary.27191. Memorial Sloan Kettering / VA Tampa; Evangelia Katsoulakis (first author). Provides contextual reference for the adjuvant RT pathway; jump_P[2,6] is now sourced from Dana-Farber Group 2011 and Studer et al. 2007 above.</p></li>
<li><p><a href="https://pubmed.ncbi.nlm.nih.gov/39243149/" rel="nofollow" target="_blank">Lee et al. (2024)</a> “Clinical Outcome of Salvage Surgery in Patients with Recurrent Oral Cavity Cancer: A Systematic Review and Meta-Analysis”, <em>Head Neck.</em> 46(11):2901–2909. PMID: 39243149. DOI: 10.1002/hed.27928. 14 retrospective cohort studies, n = 2,069; pooled 5-year OS after salvage surgery = 43.0%; late-relapse subgroup 63.8% vs. 30.0% for early relapse. Primary source for jump_P[7,6] = 0.43 (salvage success) and jump_P[7,8] = 0.57 (mortality); revised from 0.25/0.75 — previous values over-estimated post-recurrence mortality relative to this pooled estimate.</p></li>
<li><p><a href="https://pubmed.ncbi.nlm.nih.gov/17210345/" rel="nofollow" target="_blank">Liu et al. (2007)</a> “Impact of Recurrence Interval on Survival of Oral Cavity Squamous Cell Carcinoma Patients after Local Relapse”, <em>Otolaryngol Head Neck Surg.</em> 136(1):112–118. PMID: 17210345. DOI: 10.1016/j.otohns.2006.07.002. n = 1,687 oral cancer patients; 5-year OS after local recurrence 31.56%; recurrence within 18 months predicted higher mortality. Supports S7 sojourn of 12.5 months.</p></li>
<li><p><a href="https://www.nejm.org/doi/full/10.1056/NEJMoa1514493" rel="nofollow" target="_blank">Mehanna et al. (2016)</a> “PET-CT Surveillance versus Neck Dissection in Advanced Head and Neck Cancer”, <em>N Engl J Med.</em> 374(15):1444–1454. PMID: 26958921. DOI: 10.1056/NEJMoa1514493. PET-NECK RCT, n = 564 patients; response imaging performed at 12 weeks post-chemoradiotherapy; established ≥12-week post-RT interval as standard for response assessment in HNSCC. Supports S2 mean sojourn of 3.0 months (RT course 1.5 mo + ≥12-week response assessment window).</p></li>
<li><p><a href="https://journals.sagepub.com/doi/abs/10.1177/0194599814551718" rel="nofollow" target="_blank">Luryi et al. (2014)</a> “Positive Surgical Margins in Early Stage Oral Cavity Cancer: An Analysis of 20,602 Cases”, <em>Otolaryngol Head Neck Surg.</em> 151(6):984–990. PMID: 25210849. Large NCDB analysis of surgical margin rates and downstream outcomes in early-stage oral cavity cancer; early-stage N0 patients with negative margins have materially lower recurrence rates than all-stage series. Supports S3 transition probability framework, jump_P[3,7] = 0.18, and S6 NED calibration to 120 months.</p></li>
<li><p><a href="https://doi.org/10.18203/issn.2454-5929.ijohns20252980" rel="nofollow" target="_blank">Nathan et al. (2025)</a> “The Influence of Reconstruction following Hemiglossectomy on Perioperative Outcomes”, <em>Int J Otorhinolaryngol Head Neck Surg.</em> DOI: 10.18203/issn.2454-5929.ijohns20252980. NSQIP database study, n = 866 hemiglossectomy patients (2008–2022); consistent with ~1–2% 30-day mortality in modern series. Supports jump_P[1,8] = 0.015.</p></li>
<li><p><a href="https://pubmed.ncbi.nlm.nih.gov/16916677/" rel="nofollow" target="_blank">Ord, Kolokythas & Reynolds (2006)</a> “Surgical Salvage for Local and Regional Recurrence in Oral Cancer”, <em>J Oral Maxillofac Surg.</em> 64(9):1409–1414. PMID: 16916677. DOI: 10.1016/j.joms.2006.05.026. Reports outcomes of surgical salvage in recurrent oral cancer; data on local and regional recurrence rates and survival after primary surgery. Supports jump_P[3,7] = 0.18 (LR recurrence 12–18% for Stage II) and S3 mean sojourn revision to 22 months.</p></li>
<li><p><a href="https://www.sciencedirect.com/science/article/pii/S0360301611003245" rel="nofollow" target="_blank">Sher et al. (2011)</a> “Treatment of Oral Cavity Squamous Cell Carcinoma with Adjuvant or Definitive Intensity-Modulated Radiation Therapy”, <em>Int J Radiat Oncol Biol Phys.</em> 2011. PMID: 21531515. Dana-Farber Cancer Institute; n = 42 OCSCC (30 adjuvant IMRT, 12 definitive IMRT); Stage I–IV (24% Stage II); 2-yr LRC for definitive IMRT = 64%, 2-yr OS = 63%. Corroborated by Studer G, Zwahlen RA, Graetz KW et al. (2007), “IMRT in oral cavity cancer”, <em>Radiat Oncol.</em> 2:16. PMID: 17430599. DOI: 10.1186/1748-717X-2-16. University Hospital Zurich; n = 58 OCC IMRT patients; T1 LC = 95%; T2–4 and recurred stages LC ~50–60% at 2 years; definitive IMRT LC = 43% (but cohort was 69% T3/4 or recurred). <strong>Primary reference pair for jump_P[2,6] = 0.65</strong> — definitive RT arm locoregional control for T2N0M0 Stage II; value is consistent with the early-stage subgroup in both series.</p></li>
<li><p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8831824/" rel="nofollow" target="_blank">Shetty et al. (2022)</a> “Salvage Surgery in Recurrent Oral Squamous Cell Carcinoma”, <em>Front Oral Health.</em> PMC8831824. Review of salvage surgery outcomes; radiation-naive recurrent OCSCC after salvage surgery: 5-year OS 59%, recurrence-free survival 60%. Provides supporting context for the salvage surgery framework; current value jump_P[7,6] = 0.43 is sourced from Lee et al. (2024).</p></li>
<li><p><a href="https://doi.org/10.3322/caac.21820" rel="nofollow" target="_blank">Siegel et al. (2024)</a> “Cancer Statistics, 2024”, <em>CA Cancer J Clin.</em> 74(1):12–49. PMID: 38230766. DOI: 10.3322/caac.21820. Annual American Cancer Society SEER-based statistics report. Figure 5 reports 5-year relative survival for localized oral cavity cancer (SEER localised stage, encompassing Stage I–II) as approximately 84% (diagnoses 2013–2019, follow-up through 2020). Note: relative survival overstates absolute OS — adjusting for sex-weighted background mortality from the SSA 2021 Period Life Table at median age 62 (<img src="https://latex.codecogs.com/png.latex?%5C%5Cbar%7BS%7D%5E%7B(5)%7D%20=%200.920">) yields an estimated absolute 5-yr OS of approximately 77% (<img src="https://latex.codecogs.com/png.latex?0.84%20%5C%5Ctimes%200.920%20=%200.773">) for Stage II localised oral cavity. Used as the primary population-level calibration benchmark for the Surgery arm.</p></li>
<li><p><a href="https://pubmed.ncbi.nlm.nih.gov/28533474/" rel="nofollow" target="_blank">Szturz et al. (2017)</a> “Weekly Low-Dose Versus Three-Weekly High-Dose Cisplatin for Concurrent Chemoradiation in Locoregionally Advanced Non-Nasopharyngeal Head and Neck Cancer: A Systematic Review and Meta-Analysis”, <em>Oncologist.</em> 22(9):1056–1066. PMID: 28533474.</p></li>
<li><p><a href="https://jamanetwork.com/journals/jamaotolaryngology/fullarticle/2618943" rel="nofollow" target="_blank">Tam et al. (2017)</a> “Estimating Survival After Salvage Surgery for Recurrent Oral Cavity Cancer”, <em>JAMA Otolaryngol Head Neck Surg.</em> 143(7):685–690. PMID: 28448645. Reports survival outcomes following salvage surgery for recurrent oral cavity SCC. Supports S7 transition structure.</p></li>
<li><p><a href="https://doi.org/10.1002/ohn.205" rel="nofollow" target="_blank">Tassone et al. (2023)</a> “Going Off Guidelines: An NCDB Analysis of Missed Adjuvant Therapy Among Surgically Treated Oral Cavity Cancer”, <em>Otolaryngol Head Neck Surg.</em> 169(3):632–641. PMID: 36939392. DOI: 10.1002/ohn.205. NCDB analysis, n = 53,503; establishes PORT vs. POCRT indications and allocation rates. Supports jump_P[1,4] = 0.25 and jump_P[1,5] = 0.15.</p></li>
<li><p><a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/hed.20234" rel="nofollow" target="_blank">Temam et al. (2005)</a> “Treatment of the N0 Neck during Salvage Surgery after Radiotherapy of Head and Neck Squamous Cell Carcinoma”, <em>Head Neck.</em> 27(8):653–658. Cited in v2 for salvage success rates. Superseded in v3 by Lee et al. 2024; retained for historical context.</p></li>
</ul>
</section>
<section id="references-for-eq-5d-utilities" class="level3">
<h3 class="anchored" data-anchor-id="references-for-eq-5d-utilities">References for EQ-5D Utilities</h3>
<ul>
<li><p><a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/hed.23340" rel="nofollow" target="_blank">de Almeida et al. (2014)</a> “Preferences and Utilities for Health States after Treatment for Oropharyngeal Cancer: Transoral Robotic Surgery versus Definitive (Chemo)radiotherapy”, <em>Head Neck.</em> 36(4):529–539. Reports EQ-5D utility values across treatment modalities; informs utility weights for treatment and NED states.</p></li>
<li><p><a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/coa.12502" rel="nofollow" target="_blank">Govers et al. (2016)</a> “Quality of Life after Different Procedures for Regional Control in Oral Cancer Patients: Cross-Sectional Survey”, <em>Clin Otolaryngol.</em> 41(3):228–235. EQ-5D-3L measurement in oral cavity OSCC patients following different regional treatment approaches.</p></li>
<li><p><a href="https://doi.org/10.1186/s12955-017-0748-z" rel="nofollow" target="_blank">Meregaglia & Cairns (2017)</a> “A Systematic Literature Review of Health State Utility Values in Head and Neck Cancer”, <em>Health Qual Life Outcomes.</em> 15(1):174. DOI: 10.1186/s12955-017-0748-z. PMID: 28865475. Systematic review of 28 studies and 346 health state utility values in HNC; recommends EQ-5D as the preferred instrument. Confirms that evidence for recurrent and metastatic HNC states is sparse: the only patient-reported EQ-5D for recurrence found in the review is a median of 0.70 (del Barco et al. 2016), from a palliative-intent recurrent/metastatic cohort — a different clinical context from S7 (locoregional recurrence with potential salvage intent) in this model. The S7 utility of 0.55 is a modeller’s assumption informed by this evidence gap, not a directly reported value from this review.</p></li>
<li><p><a href="https://jamanetwork.com/journals/jamaotolaryngology/fullarticle/2397443#:~:text=Design%2C%20Setting%2C%20and%20Participants%20In,Mark%203%20(HUI3)%20questionnaire." rel="nofollow" target="_blank">Noel et al. (2015)</a> “Comparison of Health State Utility Measures in Patients With Head and Neck Cancer”, <em>JAMA Otolaryngol Head Neck Surg.</em> 141(8):696–703. Prospective, cross-sectional, and longitudinal study of 100 patients with squamous cell carcinoma of the upper aerodigestive tract. Mean EQ-5D-5L = 0.82 three months post-treatment with no evidence of recurrence. Primary source for NED (S6) utility value.</p></li>
<li><p><a href="https://doi.org/10.1016/j.oraloncology.2011.05.012" rel="nofollow" target="_blank">Ramaekers et al. (2011)</a> “The Impact of Late Treatment-Toxicity on Generic Health-Related Quality of Life in Head and Neck Cancer Patients after Radiotherapy”, <em>Oral Oncol.</em> 47(8):768–774. DOI: 10.1016/j.oraloncology.2011.05.012. Multi-centre cross-sectional survey; EQ-5D measured in HNC patients <strong>at least 6 months post-RT</strong> (late-effects survivorship cohort, not active treatment). Reports xerostomia and dysphagia as independent predictors of reduced utility post-RT. Retained as contextual reference for late RT toxicity; <strong>not</strong> the primary source for S4 (PORT) utility — that cohort does not represent patients during active adjuvant radiotherapy.</p></li>
<li><p><a href="https://doi.org/10.1186/s12885-022-10346-4" rel="nofollow" target="_blank">Sprave et al. (2022)</a> “Patient Reported Outcomes Based on EQ-5D-5L Questionnaires in Head and Neck Cancer Patients: A Real-World Study”, <em>BMC Cancer</em> 22:1236. DOI: 10.1186/s12885-022-10346-4. PMID: 36447175. Freiburg University; n = 366 H&N cancer patients undergoing modern RT; prospective real-world PRO study; mean EQ-5D-5L at end of RT = 0.830 (SD not reported for full cohort). <strong>Primary source for S2 (Definitive RT) and S4 (PORT) utility weights = 0.83.</strong> Corroborated by Sprave et al. (2020, n=49): mean EQ-5D at RT completion = 0.828.</p></li>
<li><p><a href="https://doi.org/10.1080/14737167.2020.1823220" rel="nofollow" target="_blank">Sprave et al. (2020)</a> “Characterization of Health-Related Quality of Life Based on the EQ-5D-5L Questionnaire in Head-and-Neck Cancer Patients Undergoing Modern Radiotherapy”, <em>Expert Rev Pharmacoecon Outcomes Res.</em> 20(6):673–682. DOI: 10.1080/14737167.2020.1823220. PMID: 32912005. Freiburg University Medical Center; n = 49 H&N cancer patients (57% definitive, 41% adjuvant RT); mean EQ-5D-5L at RT completion = 0.828 (SD 0.16). Corroborates Sprave 2022; supports S4 utility revision. Note: no significant permanent HRQOL deterioration observed at 6-month follow-up.</p></li>
<li><p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5303682/" rel="nofollow" target="_blank">Truong et al. / RTOG 0522 (2017)</a> “Quality of Life and Performance Status from a Substudy Conducted Within a Prospective Phase 3 Randomized Trial of Concurrent Accelerated Radiation Plus Cisplatin With or Without Cetuximab for Locally Advanced Head and Neck Carcinoma: NRG Oncology RTOG 0522”, <em>Int J Radiat Oncol Biol Phys.</em> 97(4):687–699. PMC5303682. PMID: 27727066. DOI: 10.1016/j.ijrobp.2016.08.003. Prospective phase III RCT, n = 818 analyzable patients (oropharynx, hypopharynx, larynx; Stage III–IV); EQ-5D-3L collected at pretreatment, last 2 weeks of treatment, 3 months, and annually. <strong>Reported EQ-5D values:</strong> baseline ~0.79 (CIS 0.78, CET/CIS 0.80); at 3 months from treatment start ~0.775 (CIS 0.78, CET/CIS 0.77); at 1 year ~0.84. <strong>Note: the EQ-5D value for the ‘within last 2 weeks of treatment’ time point was collected but not reported in the paper</strong> — the authors identified omission of an acute FACT-HN assessment as a study limitation. The 3-month value (0.775) is used as the closest available proxy for active-treatment utility in S5 (POCRT). Secondary limitation: cohort is definitively treated oropharynx/larynx (not postoperative oral cavity POCRT); adopted as best available CRT utility source. <strong>Comparison with Sprave et al. (2022):</strong> the Sprave 2022 adjuvant CRT cohort (baseline HI = 0.849, stable to RT completion; CRT vs RT-alone not significantly different at RT completion, p = 0.624) would support a value of ~0.83 for S5. The RTOG 0522-derived value of 0.775 is retained as a conservative estimate reflecting the higher toxicity of POCRT (high-dose cisplatin, post-surgical) relative to the mixed Sprave 2022 cohort. A sensitivity analysis using 0.83 would reduce the Surgery vs. Def RT QALY difference by ~0.009 QALYs. <strong>Primary source for S5 (POCRT) utility = 0.775.</strong> Not the primary source for S2 (RT alone); see Sprave et al. (2022).</p></li>
</ul>


</section>
</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rworks.dev/posts/oscc-patient-model/"> R Works</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/stage-ii-oscc-health-economics-model/">Stage II OSCC — Health Economics Model</a>]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">400433</post-id>	</item>
		<item>
		<title>Why Most Time Series Models Fail Before They Start</title>
		<link>https://www.r-bloggers.com/2026/04/why-most-time-series-models-fail-before-they-start/</link>
		
		<dc:creator><![CDATA[M. Fatih Tüzen]]></dc:creator>
		<pubDate>Wed, 15 Apr 2026 21:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>1 A model can run and still be fundamentally wrong<br />
Many time series models fail before they even begin. Not because the software crashes. Not because the code is wrong. But because the data entering the model violate one of the most impor...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/why-most-time-series-models-fail-before-they-start/">Why Most Time Series Models Fail Before They Start</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/"> A Statistician&#039;s R Notebook</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 






<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/timeseries_stationary.png?w=578&#038;ssl=1" class="img-fluid quarto-figure quarto-figure-center figure-img" data-recalc-dims="1"></p>
</figure>
</div>
<section id="a-model-can-run-and-still-be-fundamentally-wrong" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="a-model-can-run-and-still-be-fundamentally-wrong"><span class="header-section-number">1</span> A model can run and still be fundamentally wrong</h2>
<p>Many time series models fail before they even begin. Not because the software crashes. Not because the code is wrong. But because the data entering the model violate one of the most important assumptions in time series analysis: <strong>stationarity</strong>.</p>
<p>This is where many analyses quietly go off the rails. A model is estimated, forecasts are produced, coefficients look serious, and the graphs appear convincing. But the model may be chasing a moving target rather than learning a stable data-generating mechanism.</p>
<p>In this post, we will work with a real macroeconomic series rather than a toy example. The data come from the <strong>Consumer Price Index for All Urban Consumers: All Items (CPIAUCSL)</strong>, published by the U.S. Bureau of Labor Statistics and distributed through FRED. FRED describes CPIAUCSL as a monthly, seasonally adjusted price index and notes that percent changes in the index are commonly used to measure inflation.</p>
<p>Because live API access may fail in some institutional or offline environments, this workflow uses a <strong>locally downloaded CSV file</strong> instead of fetching the series on the fly. You can download the file directly from the <a href="https://fred.stlouisfed.org/series/CPIAUCSL" rel="nofollow" target="_blank">CPIAUCSL page on FRED</a>.</p>
<p>The goal is simple: show why raw time series levels often mislead us, what stationarity really means, and why transformations such as differencing and log-differencing are not cosmetic tricks but conceptual necessities.</p>
</section>
<section id="what-stationarity-really-means" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="what-stationarity-really-means"><span class="header-section-number">2</span> What stationarity really means</h2>
<p>In informal language, a stationary series is one whose behavior does not drift in a systematic way over time. More formally, a weakly stationary process (<img src="https://latex.codecogs.com/png.latex?X_t">) satisfies three conditions:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AE(X_t)%20=%20%5Cmu%0A"></p>
<p><img src="https://latex.codecogs.com/png.latex?%0AVar(X_t)%20=%20%5Csigma%5E2%0A"></p>
<p><img src="https://latex.codecogs.com/png.latex?%0ACov(X_t,%20X_%7Bt-k%7D)%20=%20%5Cgamma_k%0A"></p>
<p>The first condition says the mean does not change over time. The second says the variance is constant. The third says the covariance between observations depends only on the lag (k), not on calendar time itself.</p>
<p>This matters because a large part of classical time series modeling is built on the idea that the stochastic structure is stable. When that structure is drifting, many familiar tools become unreliable or at least much harder to interpret. A trending series can generate strong autocorrelation even when the underlying dynamic structure is weak. A persistent upward path can trick the analyst into seeing “model fit” where the model is merely inheriting inertia from the level of the series.</p>
<p>Put differently: without stationarity, a model may explain movement without actually explaining the mechanism.</p>
</section>
<section id="load-the-cpi-data-from-a-csv-file" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="load-the-cpi-data-from-a-csv-file"><span class="header-section-number">3</span> Load the CPI data from a CSV file</h2>
<p>Download the CSV file for <strong>CPIAUCSL</strong> from the official FRED series page and save it in your working directory with the name <code>CPIAUCSL.csv</code>. The file typically includes the columns <code>observation_date</code> and <code>CPIAUCSL</code>. FRED is the distribution platform, while the source agency for the series is the U.S. Bureau of Labor Statistics.</p>
<div class="cell">
<pre>library(readr)
library(dplyr)
library(ggplot2)
library(tibble)
library(zoo)
library(scales)
library(patchwork)
library(tseries)

cpi_tbl &lt;- read_csv(&quot;CPIAUCSL.csv&quot;, show_col_types = FALSE) %&gt;%
  transmute(
    date = as.Date(observation_date),
    cpi  = as.numeric(CPIAUCSL)
  ) %&gt;%
  arrange(date) %&gt;%
  filter(!is.na(date), !is.na(cpi))

cpi_tbl %&gt;% slice_head(n = 5)</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 5 × 2
  date         cpi
  &lt;date&gt;     &lt;dbl&gt;
1 1947-01-01  21.5
2 1947-02-01  21.6
3 1947-03-01  22  
4 1947-04-01  22  
5 1947-05-01  22.0</pre>
</div>
</div>
<p>The line <code>filter(!is.na(date), !is.na(cpi))</code> is important. If your CSV has an <code>NA</code> for a month such as October 2025, that observation is safely excluded from the analysis instead of silently breaking the workflow.</p>
</section>
<section id="start-with-the-visual-story-not-the-test-statistic" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="start-with-the-visual-story-not-the-test-statistic"><span class="header-section-number">4</span> Start with the visual story, not the test statistic</h2>
<p>In time series analysis, the first serious diagnostic is often visual rather than formal. That is not because tests are unimportant. It is because plots let us see the basic character of the data before we start compressing everything into a p-value.</p>
<p>If a series has a visible trend, changing volatility, sudden level shifts, or unusual gaps, that already tells us something about whether a stationary model is likely to behave well.</p>
<section id="the-raw-cpi-level" class="level3" data-number="4.1">
<h3 data-number="4.1" class="anchored" data-anchor-id="the-raw-cpi-level"><span class="header-section-number">4.1</span> The raw CPI level</h3>
<div class="cell">
<pre>p_level &lt;- ggplot(cpi_tbl, aes(x = date, y = cpi)) +
  geom_line(linewidth = 0.9, color = &quot;#1B4965&quot;) +
  labs(
    title = &quot;U.S. CPI (CPIAUCSL): level series&quot;,
    subtitle = &quot;Monthly, seasonally adjusted index from FRED&quot;,
    x = NULL,
    y = &quot;Index&quot;
  ) +
  scale_y_continuous(labels = label_number()) +
  theme_minimal(base_size = 12)

p_level</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/index_files/figure-html/unnamed-chunk-2-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>Even before applying a formal statistical test, the visual pattern already tells us something important. The CPI level series does not oscillate around a stable mean; instead, it follows a persistent upward path over time. This alone raises an immediate warning against modeling the raw level series as if it were stationary.</p>
<p>The graph also suggests that the increase is not perfectly uniform across the entire sample. In some periods, the slope becomes steeper, indicating faster price growth, while in others the series evolves more gradually. In other words, the series appears to contain not only a long-run trend but also changes in inflation dynamics over time.</p>
<p>This is precisely why visual inspection should be the first step in time series analysis. Before looking at test statistics or fitting a model, we should ask a simpler question: does the series <em>look</em> like it fluctuates around a constant level? In this case, the answer is clearly no.</p>
<p>A smooth and steadily rising curve may look statistically innocent at first glance, but in practice it is often a sign that the raw series is carrying trend information that must be addressed before modeling.</p>
</section>
<section id="rolling-summaries-to-deepen-the-visual-diagnosis" class="level3" data-number="4.2">
<h3 data-number="4.2" class="anchored" data-anchor-id="rolling-summaries-to-deepen-the-visual-diagnosis"><span class="header-section-number">4.2</span> Rolling summaries to deepen the visual diagnosis</h3>
<p>A single line plot is useful, but local summaries make the visual argument sharper. Below, I compute a 24-month rolling mean and rolling standard deviation.</p>
<div class="cell">
<pre>cpi_roll &lt;- cpi_tbl %&gt;%
  mutate(
    roll_mean_24 = zoo::rollmean(cpi, k = 24, fill = NA, align = &quot;right&quot;),
    roll_sd_24   = zoo::rollapply(cpi, width = 24, FUN = sd, fill = NA, align = &quot;right&quot;)
  )

p_roll_mean &lt;- ggplot(cpi_roll, aes(date, roll_mean_24)) +
  geom_line(linewidth = 0.9, color = &quot;#2A9D8F&quot;) +
  labs(
    title = &quot;24-month rolling mean of CPI&quot;,
    x = NULL,
    y = &quot;Rolling mean&quot;
  ) +
  theme_minimal(base_size = 12)

p_roll_sd &lt;- ggplot(cpi_roll, aes(date, roll_sd_24)) +
  geom_line(linewidth = 0.9, color = &quot;#E76F51&quot;) +
  labs(
    title = &quot;24-month rolling standard deviation of CPI&quot;,
    x = NULL,
    y = &quot;Rolling SD&quot;
  ) +
  theme_minimal(base_size = 12)

p_roll_mean / p_roll_sd</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/index_files/figure-html/unnamed-chunk-3-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>If the series were approximately stationary, we would expect these rolling statistics to fluctuate around relatively stable levels over time. In particular, the rolling mean should remain close to a constant value, and the rolling standard deviation should not exhibit systematic shifts.</p>
<p>However, the evidence here points in the opposite direction. The rolling mean shows a clear and persistent upward drift, reinforcing what we observed in the raw series: the central tendency is not stable, but evolving over time.</p>
<p>The rolling standard deviation tells a more nuanced story. While it remains relatively moderate for long periods, there are noticeable fluctuations and, more importantly, a pronounced spike in recent years. This indicates that the variability of the series is not constant and may respond to underlying economic conditions or shocks.</p>
<p>Taken together, these two plots suggest that the series violates the key assumptions of stationarity—both in terms of mean and variance. While rolling statistics alone do not formally prove non-stationarity, they provide strong visual evidence that the raw series is not suitable for direct modeling without transformation.</p>
</section>
</section>
<section id="why-raw-cpi-levels-are-a-good-example" class="level2" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="why-raw-cpi-levels-are-a-good-example"><span class="header-section-number">5</span> Why raw CPI levels are a good example</h2>
<p>CPI is ideal for illustrating this problem because the level series typically trends upward over time. That is not a defect in the data; it is what a price index often does. But from a modeling perspective, it creates trouble.</p>
<p>If the level keeps drifting upward, then the mean is not constant. If the size of movements changes as the level rises, the variance may also appear unstable. In such a setting, fitting a model directly to the raw series can mix long-run inflationary drift with short-run dynamic behavior.</p>
<p>Economically, analysts are usually not interested in the index level itself as much as they are interested in <strong>inflation</strong>, that is, the rate at which the price level changes. Statistically, this is convenient too, because transforming the series from levels to changes often brings it closer to stationarity.</p>
</section>
<section id="a-statistical-check-the-augmented-dickey-fuller-test" class="level2" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="a-statistical-check-the-augmented-dickey-fuller-test"><span class="header-section-number">6</span> A statistical check: the Augmented Dickey-Fuller test</h2>
<p>Visual diagnosis matters, but it is usually not enough. A commonly used statistical tool is the <strong>Augmented Dickey-Fuller (ADF) test</strong>, which tests for the presence of a unit root. In practical terms, the test is often used to assess whether a series behaves like a non-stationary process with persistent stochastic trend.</p>
<p>The null hypothesis of the ADF test is that the series has a unit root. That means the burden of proof is asymmetric:</p>
<ul>
<li>a <strong>large</strong> p-value means we do <strong>not</strong> have strong evidence against non-stationarity,</li>
<li>a <strong>small</strong> p-value means the data are more consistent with stationarity.</li>
</ul>
<p>That distinction is easy to say and easy to misuse. Failing to reject the null is not the same thing as proving a series is non-stationary beyond all doubt. It simply means the test did not find enough evidence against the unit-root view.</p>
<p>Let us start with the raw CPI level.</p>
<div class="cell">
<pre>adf_level &lt;- tseries::adf.test(cpi_tbl$cpi)
adf_level</pre>
<div class="cell-output cell-output-stdout">
<pre>
    Augmented Dickey-Fuller Test

data:  cpi_tbl$cpi
Dickey-Fuller = -0.1813, Lag order = 9, p-value = 0.99
alternative hypothesis: stationary</pre>
</div>
</div>
<p>The Augmented Dickey–Fuller (ADF) test provides a formal way to assess whether the series contains a unit root. The null hypothesis of the test is that the series is non-stationary (i.e., it has a unit root), while the alternative hypothesis is stationarity.</p>
<p>In this case, the p-value is extremely high (p ≈ 0.99), meaning that we fail to reject the null hypothesis. In other words, there is no statistical evidence to support that the CPI level series is stationary.</p>
<p>However, this result should not be interpreted in isolation. Statistical tests and visual diagnostics should complement each other. The high p-value is entirely consistent with what we observed earlier: the series exhibits a strong upward trend and does not fluctuate around a constant mean.</p>
<p>Taken together, both the visual evidence and the ADF test point to the same conclusion — the raw CPI level behaves more like a drifting (unit root) process than a stationary one. This reinforces the need for transforming the series before attempting any meaningful modeling.</p>
</section>
<section id="the-first-rescue-differencing" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="the-first-rescue-differencing"><span class="header-section-number">7</span> The first rescue: differencing</h2>
<p>One of the oldest and most important ideas in time series analysis is that differencing can remove certain forms of trend. The first difference is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%20X_t%20=%20X_t%20-%20X_%7Bt-1%7D%0A"></p>
<p>This transformation asks a different question. Instead of modeling the level, we model the change from one period to the next.</p>
<div class="cell">
<pre>cpi_diff_tbl &lt;- cpi_tbl %&gt;%
  mutate(diff_cpi = c(NA, diff(cpi))) %&gt;%
  filter(!is.na(diff_cpi))

p_diff &lt;- ggplot(cpi_diff_tbl, aes(x = date, y = diff_cpi)) +
  geom_line(linewidth = 0.8, color = &quot;#6D597A&quot;) +
  labs(
    title = &quot;First difference of CPI&quot;,
    subtitle = &quot;Absolute month-to-month change in the index&quot;,
    x = NULL,
    y = expression(Delta*CPI)
  ) +
  theme_minimal(base_size = 12)

p_diff</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/index_files/figure-html/unnamed-chunk-5-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>Taking the first difference removes a large part of the visible trend in the series. Compared to the raw CPI level, the differenced series fluctuates much more around a relatively stable center, which is an encouraging sign from a modeling perspective.</p>
<p>However, differencing does not fully solve the problem. While it helps stabilize the mean, the variability of the series still appears to change over time, particularly in more recent periods where larger fluctuations are observed. This suggests that the series may still violate the constant variance assumption.</p>
<p>There is also a more subtle but important issue: interpretation. The first difference represents absolute changes in the index, not relative ones. In macroeconomic data, a one-point increase in CPI does not carry the same meaning when the index is around 100 versus when it exceeds 300. As the scale of the series grows, the same absolute change reflects a smaller proportional movement.</p>
<p>In other words, differencing improves the statistical properties of the series, but it does not yet provide a fully consistent or interpretable measure of change. This is why we often go one step further and consider transformations based on relative (percentage) changes.</p>
</section>
<section id="the-more-meaningful-rescue-log-differences" class="level2" data-number="8">
<h2 data-number="8" class="anchored" data-anchor-id="the-more-meaningful-rescue-log-differences"><span class="header-section-number">8</span> The more meaningful rescue: log differences</h2>
<p>This is where the log transformation becomes more than a technical detail. Consider</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%20%5Clog(X_t)%20=%20%5Clog(X_t)%20-%20%5Clog(X_%7Bt-1%7D)%0A"></p>
<p>For moderate changes, this is approximately the proportional growth rate. In the CPI context, it moves us from the language of index levels toward the language of inflation.</p>
<p>That shift is both statistical and economic.</p>
<div class="cell">
<pre>cpi_log_tbl &lt;- cpi_tbl %&gt;%
  mutate(
    log_cpi = log(cpi),
    dlog_cpi = c(NA, diff(log_cpi)),
    annualized_inflation_pct = 1200 * dlog_cpi,
    yoy_inflation_pct = 100 * (cpi / lag(cpi, 12) - 1)
  )

p_dlog &lt;- cpi_log_tbl %&gt;%
  filter(!is.na(annualized_inflation_pct)) %&gt;%
  ggplot(aes(x = date, y = annualized_inflation_pct)) +
  geom_line(linewidth = 0.8, color = &quot;#D62828&quot;) +
  labs(
    title = &quot;Monthly log-difference of CPI (annualized)&quot;,
    subtitle = &quot;A close cousin of short-run inflation&quot;,
    x = NULL,
    y = &quot;Percent&quot;
  ) +
  theme_minimal(base_size = 12)

p_yoy &lt;- cpi_log_tbl %&gt;%
  filter(!is.na(yoy_inflation_pct)) %&gt;%
  ggplot(aes(x = date, y = yoy_inflation_pct)) +
  geom_line(linewidth = 0.8, color = &quot;#F4A261&quot;) +
  labs(
    title = &quot;Year-over-year CPI inflation&quot;,
    subtitle = &quot;A slower-moving inflation measure&quot;,
    x = NULL,
    y = &quot;Percent&quot;
  ) +
  theme_minimal(base_size = 12)

p_dlog / p_yoy</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/index_files/figure-html/unnamed-chunk-6-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>Two key insights emerge from these transformations.</p>
<p>First, moving from levels to rates of change fundamentally improves interpretability. The log-difference series represents approximate percentage changes — in this context, a close proxy for short-run inflation. This is the quantity economists actually care about. A 1% increase has the same meaning regardless of whether the index is at 100 or 300, making comparisons over time much more meaningful.</p>
<p>Second, the transformation has a clear impact on the statistical properties of the series. Compared to the raw level and even the first difference, the log-differenced series fluctuates more consistently around a stable mean. While it still exhibits volatility spikes and occasional outliers, the overall behavior is much closer to what we would expect from a stationary process.</p>
<p>The comparison between the two plots is also instructive. The monthly log-difference captures short-term fluctuations and reacts quickly to shocks, while the year-over-year inflation series smooths out this noise and highlights longer-term inflation dynamics. Both are useful, but they answer different questions.</p>
<p>To put it bluntly: you did not just transform the data — you changed the question.</p>
</section>
<section id="re-test-after-transformation" class="level2" data-number="9">
<h2 data-number="9" class="anchored" data-anchor-id="re-test-after-transformation"><span class="header-section-number">9</span> Re-test after transformation</h2>
<p>Let us apply the ADF test again, this time to the log-differenced series.</p>
<div class="cell">
<pre>adf_dlog &lt;- cpi_log_tbl %&gt;%
  filter(!is.na(dlog_cpi)) %&gt;%
  pull(dlog_cpi) %&gt;%
  tseries::adf.test()

adf_dlog</pre>
<div class="cell-output cell-output-stdout">
<pre>
    Augmented Dickey-Fuller Test

data:  .
Dickey-Fuller = -4.3862, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary</pre>
</div>
</div>
<p>The contrast between the two ADF test results is striking and highly informative.</p>
<p>For the raw CPI level, we failed to reject the null hypothesis of a unit root, indicating that the series behaves as a non-stationary process. In contrast, for the log-differenced series, the p-value drops to around 0.01, allowing us to reject the null hypothesis and conclude that the transformed series is consistent with stationarity.</p>
<p>This shift is not just a technical detail — it reflects a fundamental change in how the data behaves. The transformation has effectively removed the persistent trend component and brought the series closer to a stable statistical structure.</p>
<p>That said, the test result should always be interpreted alongside the visual evidence. The ADF test provides formal confirmation, but the intuition comes from the plots. What we saw visually — a drifting level series versus a mean-reverting transformed series — is now supported by statistical testing.</p>
<p>In essence, the workflow comes full circle:<br>
we start with a problematic series, diagnose the issue visually, apply a transformation, and then verify the improvement formally.</p>
<p>This is the core of time series thinking.</p>
</section>
<section id="a-subtle-but-crucial-point-transformation-changes-interpretation" class="level2" data-number="10">
<h2 data-number="10" class="anchored" data-anchor-id="a-subtle-but-crucial-point-transformation-changes-interpretation"><span class="header-section-number">10</span> A subtle but crucial point: transformation changes interpretation</h2>
<p>This is the point where many explanations remain superficial.</p>
<p>When you difference a series, you are not merely “cleaning” it — you are redefining the object of analysis.</p>
<ul>
<li>Modeling <strong>CPI levels</strong> asks how the price index evolves over time.</li>
<li>Modeling <strong>first differences</strong> asks how much the index changes from one period to the next.</li>
<li>Modeling <strong>log differences</strong> asks about proportional change, which is directly linked to inflation.</li>
</ul>
<p>These are not equivalent statistical questions, and they are certainly not equivalent economic questions.</p>
<p>This is why time series preprocessing should never be treated as a mechanical step. Every transformation involves a trade-off: it improves certain statistical properties while simultaneously altering the meaning of the data.</p>
<p>Understanding that trade-off is not optional — it is central to sound time series analysis.</p>
</section>
<section id="why-this-matters-for-arima-style-modeling" class="level2" data-number="11">
<h2 data-number="11" class="anchored" data-anchor-id="why-this-matters-for-arima-style-modeling"><span class="header-section-number">11</span> Why this matters for ARIMA-style modeling</h2>
<p>ARIMA models are often presented as if the workflow were mechanical: inspect the series, difference if needed, identify orders, estimate parameters, check residuals, and forecast. While this workflow is useful, it can create the misleading impression that differencing is simply a procedural step — a box to tick.</p>
<p>It is not.</p>
<p>Differencing is a deliberate modeling choice. Its purpose is to separate persistent, trend-like behavior from shorter-run dynamics. If you skip it when it is needed, your model may inherit non-stationarity and produce unreliable or misleading inference. If you apply it excessively, you risk removing meaningful structure and end up modeling noise.</p>
<p>The real question, therefore, is not “Should I difference?” but rather:<br>
<strong>What feature of the data am I trying to stabilize, and what question do I want the model to answer?</strong></p>
</section>
<section id="a-compact-comparison" class="level2" data-number="12">
<h2 data-number="12" class="anchored" data-anchor-id="a-compact-comparison"><span class="header-section-number">12</span> A compact comparison</h2>
</section>
<section id="a-compact-comparison-1" class="level2" data-number="13">
<h2 data-number="13" class="anchored" data-anchor-id="a-compact-comparison-1"><span class="header-section-number">13</span> A compact comparison</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 23%">
<col style="width: 23%">
<col style="width: 23%">
<col style="width: 30%">
</colgroup>
<thead>
<tr class="header">
<th>Series version</th>
<th>What it represents</th>
<th>Typical issue</th>
<th>When it helps (and when it does not)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>CPI level</td>
<td>The price index itself</td>
<td>Strong trend, likely unit root</td>
<td>Poor starting point for stationary modeling</td>
</tr>
<tr class="even">
<td>First difference</td>
<td>Absolute period-to-period change</td>
<td>Still scale-dependent</td>
<td>Reduces trend, but interpretation remains limited</td>
</tr>
<tr class="odd">
<td>Log difference</td>
<td>Approximate proportional change</td>
<td>May still show volatility bursts</td>
<td>More suitable for modeling inflation-type dynamics</td>
</tr>
<tr class="even">
<td>Year-over-year change</td>
<td>Annual percentage change</td>
<td>Smoother, less responsive</td>
<td>Useful for communication, less suited for short-run analysis</td>
</tr>
</tbody>
</table>
</section>
<section id="common-mistakes" class="level2" data-number="14">
<h2 data-number="14" class="anchored" data-anchor-id="common-mistakes"><span class="header-section-number">14</span> Common mistakes</h2>
<p>Most mistakes in time series analysis are not computational — they are conceptual.</p>
<p><strong>Mistake 1: fitting models directly to raw levels because the plot “looks smooth.”</strong><br>
Smoothness is not stationarity. A strong trend can produce visually smooth series that are statistically problematic.</p>
<p><strong>Mistake 2: treating differencing as a harmless default.</strong><br>
Differencing changes the meaning of the data. It may improve statistical properties while quietly reducing interpretability if applied without care.</p>
<p><strong>Mistake 3: relying on a single test result as final truth.</strong><br>
The ADF test is useful, but it is only one piece of evidence. Visual inspection, domain knowledge, structural breaks, and alternative tests all matter.</p>
<p><strong>Mistake 4: forgetting the economics.</strong><br>
In the case of CPI, the focus is typically on inflation, not the index level itself. A good transformation is one that improves statistical validity while remaining aligned with the economic question.</p>
<p>Taken together, these mistakes point to a simple lesson:<br>
<strong>time series analysis is not about applying steps — it is about making informed choices.</strong></p>
</section>
<section id="final-thoughts" class="level2" data-number="15">
<h2 data-number="15" class="anchored" data-anchor-id="final-thoughts"><span class="header-section-number">15</span> Final thoughts</h2>
<p>Most time series models do not fail because we cannot estimate them. They fail because we model the wrong object.</p>
<p>The raw CPI series is a clear reminder that not every observed series is ready for modeling. A trending index is rarely an appropriate input for a stationary model. Once we difference — and especially log-difference — the data, the series becomes more interpretable, more stable, and much closer to the type of process that classical time series methods are designed to handle.</p>
<p>So before asking whether your model is sophisticated enough, ask a more fundamental question:</p>
<p><strong>Am I modeling a stable process — or just chasing drift?</strong></p>
<p>In many cases, the answer to this question matters far more than whether you choose AR(1), ARIMA(1,1,1), or any other fashionable specification.</p>
</section>
<section id="references-and-further-reading" class="level2" data-number="16">
<h2 data-number="16" class="anchored" data-anchor-id="references-and-further-reading"><span class="header-section-number">16</span> References and further reading</h2>
<section id="data-sources" class="level3" data-number="16.1">
<h3 data-number="16.1" class="anchored" data-anchor-id="data-sources"><span class="header-section-number">16.1</span> Data sources</h3>
<ul>
<li><p>FRED, Federal Reserve Bank of St. Louis. <em>Consumer Price Index for All Urban Consumers: All Items (CPIAUCSL).</em><br>
<a href="https://fred.stlouisfed.org/series/CPIAUCSL" class="uri" rel="nofollow" target="_blank">https://fred.stlouisfed.org/series/CPIAUCSL</a></p></li>
<li><p>FRED API documentation. <em>St. Louis Fed Web Services: FRED® API.</em><br>
<a href="https://fred.stlouisfed.org/docs/api/fred/" class="uri" rel="nofollow" target="_blank">https://fred.stlouisfed.org/docs/api/fred/</a></p></li>
</ul>
<hr>
</section>
<section id="core-time-series-references" class="level3" data-number="16.2">
<h3 data-number="16.2" class="anchored" data-anchor-id="core-time-series-references"><span class="header-section-number">16.2</span> Core time series references</h3>
<ul>
<li><p>Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). <em>Time Series Analysis: Forecasting and Control.</em> Wiley.</p></li>
<li><p>Hyndman, R. J., & Athanasopoulos, G. (2021). <em>Forecasting: Principles and Practice (3rd ed.).</em><br>
<a href="https://otexts.com/fpp3/" class="uri" rel="nofollow" target="_blank">https://otexts.com/fpp3/</a></p></li>
<li><p>Hamilton, J. D. (1994). <em>Time Series Analysis.</em> Princeton University Press.</p></li>
</ul>
<hr>
</section>
<section id="stationarity-and-unit-root-testing" class="level3" data-number="16.3">
<h3 data-number="16.3" class="anchored" data-anchor-id="stationarity-and-unit-root-testing"><span class="header-section-number">16.3</span> Stationarity and unit root testing</h3>
<ul>
<li><p>Dickey, D. A., & Fuller, W. A. (1979). <em>Distribution of the estimators for autoregressive time series with a unit root.</em> Journal of the American Statistical Association.</p></li>
<li><p>Said, S. E., & Dickey, D. A. (1984). <em>Testing for unit roots in autoregressive-moving average models of unknown order.</em> Biometrika.</p></li>
</ul>
<hr>
</section>
<section id="transformations-and-interpretation" class="level3" data-number="16.4">
<h3 data-number="16.4" class="anchored" data-anchor-id="transformations-and-interpretation"><span class="header-section-number">16.4</span> Transformations and interpretation</h3>
<ul>
<li><p>Stock, J. H., & Watson, M. W. (2019). <em>Introduction to Econometrics.</em> Pearson.</p></li>
<li><p>Tsay, R. S. (2010). <em>Analysis of Financial Time Series.</em> Wiley.</p></li>
</ul>
<hr>
</section>
<section id="practical-r-resources" class="level3" data-number="16.5">
<h3 data-number="16.5" class="anchored" data-anchor-id="practical-r-resources"><span class="header-section-number">16.5</span> Practical R resources</h3>
<ul>
<li><p>R Core Team. <em>R: A Language and Environment for Statistical Computing.</em><br>
<a href="https://www.r-project.org/" class="uri" rel="nofollow" target="_blank">https://www.r-project.org/</a></p></li>
<li><p>Hyndman, R. J. et al. <em>forecast package documentation.</em><br>
<a href="https://pkg.robjhyndman.com/forecast/" class="uri" rel="nofollow" target="_blank">https://pkg.robjhyndman.com/forecast/</a></p></li>
</ul>
<hr>
</section>
<section id="suggested-next-steps-for-readers" class="level3" data-number="16.6">
<h3 data-number="16.6" class="anchored" data-anchor-id="suggested-next-steps-for-readers"><span class="header-section-number">16.6</span> Suggested next steps for readers</h3>
<p>If you want to go deeper, consider exploring:</p>
<ul>
<li>Unit root tests beyond ADF (KPSS, Phillips–Perron)</li>
<li>Structural breaks and regime changes</li>
<li>Seasonal differencing and SARIMA models</li>
<li>Volatility modeling (ARCH/GARCH)</li>
</ul>
<p>These topics build directly on the ideas discussed in this article and will deepen your understanding of time series behavior.</p>


<!-- -->

</section>
</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://mfatihtuzen.netlify.app/posts/2026-04-16_timeseries_stationary/"> A Statistician&#039;s R Notebook</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/why-most-time-series-models-fail-before-they-start/">Why Most Time Series Models Fail Before They Start</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400599</post-id>	</item>
		<item>
		<title>logrittr: A Verbose Pipe Operator for Logging dplyr Pipelines</title>
		<link>https://www.r-bloggers.com/2026/04/logrittr-a-verbose-pipe-operator-for-logging-dplyr-pipelines/</link>
		
		<dc:creator><![CDATA[Guillaume Pressiat]]></dc:creator>
		<pubDate>Wed, 15 Apr 2026 08:05:50 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://guillaumepressiat.github.io/blog/2026/04/logrittr</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>  dplyr verbs are descriptive: let’s make them more verbose!</p>
<p>  Yet another pipe for R.</p>
<p>Motivation</p>
<p>In SAS, every DATA step prints a log:</p>
<p>NOTE: There were 120000 observations read from WORK.SALES.<br />
NOTE: 7153 observations wer...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/logrittr-a-verbose-pipe-operator-for-logging-dplyr-pipelines/">logrittr: A Verbose Pipe Operator for Logging dplyr Pipelines</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://guillaumepressiat.github.io/blog/2026/04/logrittr"> Guillaume Pressiat</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><a href="https://github.com/guillaumepressiat/logrittr" rel="nofollow" target="_blank">
<img src="https://i2.wp.com/github.com/GuillaumePressiat/logrittr/raw/main/man/figures/logo.png?w=15%25&#038;ssl=1" style="float:right;padding-bottom: 20px;padding-right:30%" data-recalc-dims="1" />
</a></p>

<p><br /></p>

<blockquote>
  <p>dplyr verbs are descriptive: let’s make them more verbose!</p>
</blockquote>

<blockquote>
  <p>Yet another pipe for R.</p>
</blockquote>

<p><br />
<br /></p>

<span id="more-400575"></span>

<center>
<a href="https://github.com/guillaumepressiat/logrittr" rel="nofollow" target="_blank">
<img src="https://i2.wp.com/guillaumepressiat.github.io/images/pastels_example.png?w=85%25&#038;ssl=1" data-recalc-dims="1" />
</a>
</center>

<p><br /></p>
<hr width="50%" />

<p><br /></p>

<h2 id="motivation">Motivation</h2>

<p>In SAS, every DATA step prints a log:</p>

<figure class="highlight"><pre>NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 7153 observations were deleted.
NOTE: The data set WORK.RESULT has 112847 observations and 11 variables.</pre></figure>

<p>R’s <code>dplyr</code> pipelines are silent. <code>logrittr</code> fills that gap with <code>%&gt;=%</code>, a
drop-in pipe that logs row counts, column counts, added/dropped columns, and
timing at every step, with no function masking.</p>

<p>With <a href="https://github.com/tonsky/FiraCode" rel="nofollow" target="_blank">Fira Code</a> ligatures, <code>%&gt;=%</code> renders
as a single wide arrow visually similar to <code>%&gt;%</code> with an underline added, like a subtitle or, say, to read between the lines of a pipeline (what happened).</p>

<h2 id="multiples-contexts">Multiples contexts</h2>

<p>Things happens:</p>

<figure class="highlight"><pre>NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 120000 observations were deleted.
NOTE: The data set WORK.RESULT has 0 observations and 11 variables.</pre></figure>

<p>“It’s here we’ve lost all rows in script execution”.</p>

<h4 id="pro">Pro</h4>

<p>Reading this a long time after execution of a script helps you see:</p>

<ul>
  <li>what happened at each stage of data processing without having to rerun the code, for example in a production environment where the input data is constantly changing</li>
  <li>monitor key processes</li>
  <li>Make sure you can explain what happened (an audit, for example)</li>
</ul>

<p>In professional contexts it’s often needed.</p>

<h4 id="educational">Educational</h4>

<p>This will also be clearer thanks to a console log for those with little experience with 
the tidyverse: people who are taking their first steps in programming by following a tutorial or teaching themselves.</p>

<h2 id="installation">Installation</h2>

<figure class="highlight"><pre>install.packages('logrittr', repos = 'https://guillaumepressiat.r-universe.dev')

# or from github
# devtools::install_github(&quot;GuillaumePressiat/logrittr&quot;)</pre></figure>

<p>See <a href="https://github.com/guillaumepressiat/logrittr" rel="nofollow" target="_blank">github</a> or <a href="https://guillaumepressiat.r-universe.dev/logrittr" rel="nofollow" target="_blank">r-universe</a>.</p>

<h2 id="usage">Usage</h2>

<figure class="highlight"><pre>library(logrittr)
library(dplyr)

iris %&gt;=%
  as_tibble() %&gt;=%
  filter(Sepal.Length &lt; 5)  %&gt;=%
  mutate(rn = row_number()) %&gt;=%
  semi_join(
    iris %&gt;% as_tibble() %&gt;=%
      filter(Species == &quot;setosa&quot;),
    by = &quot;Species&quot;
  )  %&gt;=%
  group_by(Species) %&gt;=%
  summarise(n = n_distinct(rn))</pre></figure>

<figure class="highlight"><pre>── iris  [rows:       150  cols:    5] ─────────────────────────────────────────────────────
&#x2139; as_tibble()                            rows:       150 +0        cols:    5 +0    [   0.0 ms]
&#x2139; filter(Sepal.Length &lt; 5)               rows:        22 -128      cols:    5 +0    [   3.0 ms]
&#x2139; mutate(rn = row_number())              rows:        22 +0        cols:    6 +1    [   1.0 ms]
  added: rn
&#x2139; &gt; filter(Species == &quot;setosa&quot;)          rows:        50 -100      cols:    5 +0    [   1.0 ms]
&#x2139; semi_join(iris %&gt;% as_tibble() %&gt;=%    rows:        20 -2        cols:    6 +0    [   5.0 ms]
  filter(Species == &quot;setosa&quot;), by =
  &quot;Species&quot;)
&#x2139; group_by(Species)                      rows:        20 +0        cols:    6 +0    [   3.0 ms]
&#x2139; summarise(n = n_distinct(rn))          rows:         1 -19       cols:    2 -4    [   2.0 ms]
  dropped: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, rn
  added: n</pre></figure>

<h3 id="screenshot">Screenshot</h3>

<center>
<a href="https://guillaumepressiat.r-universe.dev/logrittr" rel="nofollow" target="_blank">
<img src="https://i1.wp.com/guillaumepressiat.github.io/images/nycflights13_example.png?w=85%25&#038;ssl=1" data-recalc-dims="1" />
</a>
</center>

<p><br /></p>

<figure class="highlight"><pre>library(dplyr)
library(logrittr)

logrittr_options(lang = &quot;en&quot;, big_mark = &quot;,&quot;, wrap_width = NULL, max_cols = 3)

nycflights13::flights %&gt;=% 
  as_tibble() %&gt;=%
  group_by(year, month, day) %&gt;=% 
  count() %&gt;=% 
  tidyr::pivot_wider(values_from = &quot;n&quot;, names_from = &quot;day&quot;) %&gt;=% 
  glimpse()</pre></figure>

<h2 id="related-package-tidylog">Related package: <code>tidylog</code></h2>

<p><a href="https://github.com/elbersb/tidylog" rel="nofollow" target="_blank">tidylog</a> is a really neat package that gives me motivation for this one.
<code>tidylog</code> works by masking dplyr functions, not ideal te me.</p>

<p>Anyway this also was a moment for me to test a new programmer tool that 
is used a lot for programming at this time.</p>

<p><code>logrittr</code> uses a custom pipe operator and never touches
the dplyr namespace. Its console output is colorful and informative thanks to the cli package.</p>

<h2 id="working-with-lumberjack">Working with <code>lumberjack</code></h2>

<p>If you already know the <a href="https://github.com/markvanderloo/lumberjack" rel="nofollow" target="_blank">lumberjack</a> package, 
compatibility is available with logrittr (timings are approximates).</p>

<p>Calling <code>logrittr_logger$new()</code>:</p>

<figure class="highlight"><pre>library(lumberjack)
library(dplyr)

l &lt;- logrittr_logger$new(verbose = TRUE)
logfile &lt;- tempfile(fileext=&quot;.-r.log.csv&quot;)

iris %L&gt;%
   start_log(log = l, label = &quot;iris step&quot;) %L&gt;%
   as_tibble() %L&gt;%
   filter(Sepal.Length &lt; 5) %L&gt;%
   mutate(rn = row_number()) %L&gt;%
   group_by(Species) %L&gt;%
   summarise(n = n_distinct(rn)) %L&gt;%
   dump_log(file=logfile, stop = FALSE)
   

mtcars %&gt;% 
  start_log(log = l, label = &quot;mtcars step&quot;) %L&gt;%
   count() %L&gt;%
   dump_log(file=logfile, stop = TRUE)


logdata &lt;- read.csv(logfile)</pre></figure>

<p>Will write logrittr log content of multiple data steps in the same csv file.</p>

<h2 id="limitations">Limitations</h2>

<ul>
  <li>
    <p>Like <code>tidylog</code>, logrittr only works with dplyr pipelines on R data.frames (in memory)
and is not able to do so with dbplyr pipelines from databases (remote/lazy table).</p>
  </li>
  <li>
    <p>Join cardinalities nicely done in tidylog are difficult to have from the pipe 
as join is already done, at this time we only show N row and N col evolution (before / after).</p>
  </li>
  <li>
    <p>Yes it’s another pipe, not ideal. We can dream of a <code>with_logging(TRUE)</code> context that will activate behaviour of logrittr pipe in <code>|&gt;</code> or in <code>%&gt;%</code>.</p>
  </li>
</ul>

<h2 id="take-another-pipe-for-a-spin">Take another pipe for a spin</h2>

<p><code>logrittr</code> prioritizes the user experience with a structured and colorful display in the console.</p>

<p>For now, this package is just a proof of concept that gave me a chance to experiment a bit with the <code>cli</code> package and few other things. But I think there’s a need for that in R, in a specific area where SAS outputs are so informative.</p>

<ul>
  <li><a href="https://guillaumepressiat.r-universe.dev/logrittr" rel="nofollow" target="_blank">https://guillaumepressiat.r-universe.dev/logrittr</a></li>
  <li><a href="https://github.com/guillaumepressiat/logrittr" rel="nofollow" target="_blank">https://github.com/guillaumepressiat/logrittr</a></li>
</ul>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://guillaumepressiat.github.io/blog/2026/04/logrittr"> Guillaume Pressiat</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/logrittr-a-verbose-pipe-operator-for-logging-dplyr-pipelines/">logrittr: A Verbose Pipe Operator for Logging dplyr Pipelines</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400575</post-id>	</item>
		<item>
		<title>Dealing with correlation in designed field experiments: part II</title>
		<link>https://www.r-bloggers.com/2026/04/dealing-with-correlation-in-designed-field-experiments-part-ii-4/</link>
		
		<dc:creator><![CDATA[Andrea Onofri]]></dc:creator>
		<pubDate>Wed, 15 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.statforbiology.com/2026/correlation/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>With field experiments, studying the correlation between the observed traits may not be an easy task. For example, we can consider a genotype experiment, laid out in randomised complete blocks, with 27 wheat genotypes and three replicates, where sev...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/dealing-with-correlation-in-designed-field-experiments-part-ii-4/">Dealing with correlation in designed field experiments: part II</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.statforbiology.com/2026/correlation/"> R on Fixing the bridge between biologists and statisticians</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>



<p>With field experiments, studying the correlation between the observed traits may not be an easy task. For example, we can consider a genotype experiment, laid out in randomised complete blocks, with 27 wheat genotypes and three replicates, where several traits were recorded, including yield (Yield) and weight of thousand kernels (TKW). We might be interested in studying the correlation between those two traits, but we would need to face two fundamental problems:</p>
<ol style="list-style-type: decimal">
<li>the concept of correlation in such a setting is not unique, as we might either consider the correlation between the plot measurements, or the correlation between the residuals or the correlation between genotype means or the correlation between block means;</li>
<li>the experimental units are not independent, but they are grouped by genotype and block, which invalidate all inferences based on the independence assumption.</li>
</ol>
<p>I have dealt with these two problems (particularly the first one) <a href="https://www.statforbiology.com/2019/stat_general_correlationindependence1/" rel="nofollow" target="_blank">in a previous post</a>, where I gave a solution based on traditional methods of data analyses.</p>
<p>In this post, I would like to present a more advanced solution, that was advocated by Hans-Peter Piepho in a relatively recent manuscript (Piepho, 2018). Such a solution is based on mixed models and it was implemented in SAS, by using PROC MIXED. I spent a few hours ‘transporting’ those models in R, which turned out to be a difficult task, although, in the end, I seem to have came to an acceptable solution, which I would like to share here.</p>
<p>First of all, we can load the ‘WheatQuality’ dataset, that is available in the ‘statforbiology’ package; it consists of 81 records (plots) of 6 variables, i.e. the Genotype and Block factors, as well as the four responses height, TKW, weight per hectolitre and yield. The code below loads the necessary packages, the data and transforms the numeric variable ‘Block’ into a factor.</p>
<pre>rm(list = ls())
library(statforbiology)
library(dplyr)
library(tidyr)
library(sommer)
library(nlme)
#
# Loading data
dataset &lt;- getAgroData(&quot;WheatQuality&quot;) |&gt;
  mutate(Block = factor(Block),
         Genotype = factor(Genotype))
head(dataset)
##     Genotype Block Height  TKW Whectol Yield
## 1 arcobaleno     1     90 44.5    83.2 64.40
## 2 arcobaleno     2     90 42.8    82.2 60.58
## 3 arcobaleno     3     88 42.7    83.1 59.42
## 4       baio     1     80 40.6    81.8 51.93
## 5       baio     2     75 42.7    81.3 51.34
## 6       baio     3     76 41.1    81.1 47.78</pre>
<p>Piepho (2018) showed that, for an experiment like this one, all the correlation coefficients can be estimated by coding a multi-response mixed model, as follows:</p>
<p><span class="math display">\[ Y_{ijk} = \mu_i + \beta_{ik} + \tau_{ij} + \epsilon_{ijk}\]</span></p>
<p>where <span class="math inline">\(Y_{ijk}\)</span> is the response for the trait <span class="math inline">\(i\)</span>, the genotype <span class="math inline">\(j\)</span> and the block <span class="math inline">\(k\)</span>, <span class="math inline">\(\mu_i\)</span> is the mean for the trait <span class="math inline">\(i\)</span>, <span class="math inline">\(\beta_{ik}\)</span> is the effect of the block <span class="math inline">\(k\)</span> and trait <span class="math inline">\(i\)</span>, <span class="math inline">\(\tau_{ij}\)</span> is the effect of genotype <span class="math inline">\(j\)</span> for the trait <span class="math inline">\(i\)</span> and <span class="math inline">\(\epsilon_{ijk}\)</span> is the residual for the trait <span class="math inline">\(i\)</span>, the genotype <span class="math inline">\(j\)</span> and the block <span class="math inline">\(k\)</span>.</p>
<p>In the above model, the residuals <span class="math inline">\(\epsilon_{ijk}\)</span> need to be normally distributed and heteroscedastic, with trait-specific variances. Furthermore, residuals belonging to the same plot (the two observed values for the two traits) need to be correlated (correlation of errors).</p>
<p>Hans-Peter Piepho, in his paper, put forward the idea that the ‘genotype’ and ‘block’ effects for the two traits can be taken as random, which makes sense, because, for this application, we are mainly interested in variances and covariances. Both random effects (for the genotype and for the block terms) need to be heteroscedastic (trait-specific variance components) and there must be a correlation between the two traits.</p>
<p>It should be noted that, for other applications, the genotype and block effects (especially the former) might be better regarded as fixed, but we will not pursue such an idea in this post.</p>
<div id="fitting-a-bivariate-model" class="section level1">
<h1>Fitting a bivariate model</h1>
<p>To the best of my knowledge, there is no way to fit such a complex model with the ‘nlme’ package and related ‘lme()’ function (I’ll gave a hint later on, for a simpler model). In a previous post at <a href="https://www.statforbiology.com/2019/stat_general_correlationindependence2_asreml/" rel="nofollow" target="_blank">this link</a>, I have given a solution based on the ‘asreml’ package (Butler et al., 2018), but this is a paid option. In more recent times I have discovered the ‘sommer’ package (Covarrubias-Pazaran, 2016), which seems to be very useful and it is suitable to deal with the requirements of this post. The key function of ‘sommer’ is <code>mmer()</code>, and, in order to fit the above model, we need to specify the following components.</p>
<ol style="list-style-type: decimal">
<li>The response variables. When we set a bivariate model with ‘sommer’, we can ‘cbind()’ Yield and TKW.</li>
<li>The fixed model, that does not contain any effects, but the intercept (by default, the means for the two effects are separately estimated, as in Piepho, 2018).</li>
<li>The random model, that is composed by the ‘genotype’ and ‘block’ effects. For both, I specified a general unstructured variance covariance matrix, so that we can estimate two different variance components (one per trait) and one covariance component. The resulting coding is ‘~ vsr(usr(Genotype)) + vsr(usr(Block))’.</li>
<li>The residual structure, where the two observations in the same plot are heteroscedastic and correlated. This structure is fitted by default and it does not require any specific coding.</li>
</ol>
<p>The model call is:</p>
<pre>mod.bimix &lt;- mmer(cbind(Yield, TKW) ~ 1,
                   random = ~ vsr(usr(Genotype)) + vsr(usr(Block)),
                   rcov = ~ vsr(units),  
                   data = dataset,
                  verbose = FALSE, dateWarning = FALSE)
bimix.obj &lt;- summary(mod.bimix)
coefs &lt;- bimix.obj$varcomp
coefs
##                           VarComp  VarCompSE     Zratio Constraint
## u:Genotype.Yield-Yield 77.6342608 22.0978545  3.5132035   Positive
## u:Genotype.Yield-TKW   38.8320973 15.0930429  2.5728475   Unconstr
## u:Genotype.TKW-TKW     53.8613303 15.3539585  3.5079768   Positive
## u:Block.Yield-Yield     3.7104682  3.9363372  0.9426195   Positive
## u:Block.Yield-TKW      -0.2428322  1.9074202 -0.1273092   Unconstr
## u:Block.TKW-TKW         1.6684549  1.8343512  0.9095613   Positive
## u:units.Yield-Yield     6.0939217  1.1951482  5.0988836   Positive
## u:units.Yield-TKW       0.1635821  0.7242898  0.2258518   Unconstr
## u:units.TKW-TKW         4.4718011  0.8770118  5.0989065   Positive</pre>
<p>The box above shows the results about the variance-covariance parameters. In order to get the correlations, I used the delta method, as implemented in the <code>gnlht()</code> function of the ‘statforbiology’ package (the accompanying package for this blog). First of all I extracted the variance parameters together with the covariance matrix for the variance parameters from the mixed model object. For simplicity, I assigned simple names to the coefficients (V1, V2, … Vn), according to their ordering in model output.</p>
<pre># Correlation between genotype means
coefsVec &lt;- coefs[,1]
vcovMat &lt;- mod.bimix$sigmaSE # Variance-covariance for varcomp
names(coefsVec) &lt;- paste(&quot;V&quot;, 1:9, sep = &quot;&quot;)
gnlht(coefsVec, func = list(~ V2 / (sqrt(V1)*sqrt(V3) )),
      vcov. = as.matrix(vcovMat),
      parameterNames = paste(&quot;V&quot;, 1:9, sep = &quot;&quot;))
##                       Form  Estimate        SE  Z-value      p-value
## 1 V2/(sqrt(V1) * sqrt(V3)) 0.6005174 0.1306699 4.595684 4.313326e-06
#
# Correlation between block means
gnlht(coefsVec, func = list(~ V5 / (sqrt(V4)*sqrt(V6) ) ),
      vcov. = as.matrix(vcovMat),
      parameterNames = paste(&quot;V&quot;, 1:9, sep = &quot;&quot;))
##                       Form    Estimate        SE   Z-value   p-value
## 1 V5/(sqrt(V4) * sqrt(V6)) -0.09759658 0.7571256 0.1289041 0.8974335
#
# Correlation of residuals
gnlht(coefsVec, func = list(~ V8 / (sqrt(V7)*sqrt(V9) )),
      vcov. = as.matrix(vcovMat),
      parameterNames = paste(&quot;V&quot;, 1:9, sep = &quot;&quot;))
##                       Form   Estimate        SE   Z-value   p-value
## 1 V8/(sqrt(V7) * sqrt(V9)) 0.03133619 0.1385421 0.2261854 0.8210572</pre>
<p>We see that the estimates are very close to those obtained by using the Pearson’s correlation coefficients (see my previous post). The advantage of this mixed model solution is that we can also test hypotheses in a relatively reliable way. For example, we can look at the Wald tests in the output above to judge about the significance of correlations and conclude that only the genotype means are significantly correlated to one another.</p>
</div>
<div id="a-solution-with-lme" class="section level1">
<h1>A solution with ‘lme()’</h1>
<p>Considering that the block means are not correlated, if we were willing to take the ‘block’ effect as fixed, we could fit a bivariate mixed model also with the ‘nlme’ package and the function <code>lme()</code> (Pinheiro and Bates, 2000). However, we should cast the model as a ‘univariate’ model.</p>
<p>To this aim, the two variables (Yield and TKW) need to be piled up and a new factor (‘Trait’) needs to be introduced to identify the observations for the two traits. Another factor is also necessary to identify the different plots, i.e. the observational units. To perform such a restructuring, I used the <code>pivot_longer()</code> function in the ‘tidyr’ package (Wickham et al., 2024) and assigned the name ‘Y’ to the response variable, that is now composed by the two original variables Yield and TKW, one on top of the other.</p>
<pre>dataset$Plot &lt;- 1:81
mdataset &lt;- dataset |&gt;
  select(-Whectol, -Height) |&gt;
  pivot_longer(names_to = &quot;Trait&quot;, values_to = &quot;Y&quot;, cols = c(&quot;Yield&quot;, &quot;TKW&quot;)) |&gt;
  mutate(Trait = factor(Trait))
head(mdataset)
## # A tibble: 6 × 5
##   Genotype   Block  Plot Trait     Y
##   &lt;fct&gt;      &lt;fct&gt; &lt;int&gt; &lt;fct&gt; &lt;dbl&gt;
## 1 arcobaleno 1         1 Yield  64.4
## 2 arcobaleno 1         1 TKW    44.5
## 3 arcobaleno 2         2 Yield  60.6
## 4 arcobaleno 2         2 TKW    42.8
## 5 arcobaleno 3         3 Yield  59.4
## 6 arcobaleno 3         3 TKW    42.7
tail(mdataset)
## # A tibble: 6 × 5
##   Genotype Block  Plot Trait     Y
##   &lt;fct&gt;    &lt;fct&gt; &lt;int&gt; &lt;fct&gt; &lt;dbl&gt;
## 1 vitromax 1        79 Yield  54.4
## 2 vitromax 1        79 TKW    41.6
## 3 vitromax 2        80 Yield  51.0
## 4 vitromax 2        80 TKW    43.6
## 5 vitromax 3        81 Yield  48.8
## 6 vitromax 3        81 TKW    43.1</pre>
<p>The fixed model is:</p>
<pre>Y ~ Trait*Block</pre>
<p>In order to introduce a totally unstructured variance-covariance matrix for the random effect, I used the ‘pdMat’ construct:</p>
<pre>random = list(Genotype = pdSymm(~Trait - 1))</pre>
<p>Relating to the residuals, heteroscedasticity can be included by using the ‘weight()’ argument and the ‘varIdent’ variance function, which allows a different standard deviation per trait:</p>
<pre>weight = varIdent(form = ~1|Trait)</pre>
<p>I also allowed the residuals to be correlated within each plot, by using the ‘correlation’ argument and specifying a general ‘corSymm()’ correlation form. Plots are nested within genotypes, thus I used a nesting operator, as follows:</p>
<pre>correlation = corSymm(form = ~1|Genotype/Plot)</pre>
<p>The final model call is:</p>
<pre>mod &lt;- lme(Y ~ Trait*Block, 
                 random = list(Genotype = pdSymm(~Trait - 1)),
                 weight = varIdent(form=~1|Trait), 
                 correlation = corCompSymm(form=~1|Genotype/Plot),
                 data = mdataset)</pre>
<p>Retrieving the variance-covariance matrices needs some effort, as the function ‘getVarCov()’ does not appear to work properly with this multi-stratum model. First of all, we can retrieve the variance-covariance matrix for the genotype random effect (G) and the corresponding correlation matrix.</p>
<pre>#Variance structure for random effects
obj &lt;- mod
G &lt;- matrix( as.numeric(getVarCov(obj)), 2, 2 )
G
##          [,1]     [,2]
## [1,] 53.86053 38.83124
## [2,] 38.83124 77.63402
cov2cor(G)
##           [,1]      [,2]
## [1,] 1.0000000 0.6005096
## [2,] 0.6005096 1.0000000</pre>
<p>Second, we can retrieve the ‘conditional’ variance-covariance matrix (R), that describes the correlation of errors:</p>
<pre>#Conditional variance-covariance matrix (residual error)
V &lt;- corMatrix(obj$modelStruct$corStruct)[[1]] #Correlation for residuals
sds &lt;- 1/varWeights(obj$modelStruct$varStruct)[1:2]
sds &lt;- obj$sigma * sds #Standard deviations for residuals (one per trait)
R &lt;- t(V * sds) * sds #Going from correlation to covariance
R
##           [,1]      [,2]
## [1,] 6.0939152 0.1634968
## [2,] 0.1634968 4.4718077
cov2cor(R)
##            [,1]       [,2]
## [1,] 1.00000000 0.03131984
## [2,] 0.03131984 1.00000000</pre>
<p>The total correlation matrix is simply obtained as the sum of G and R:</p>
<pre>Tr &lt;- G + R
cov2cor(Tr)
##          [,1]     [,2]
## [1,] 1.000000 0.555787
## [2,] 0.555787 1.000000</pre>
<p>We see that the same results can be obtained by using ‘sommer’ and regarding the block effect as fixed, although the coding is below is much neater!</p>
<pre>mod.bimix5 &lt;- mmer(cbind(Yield, TKW) ~ Block,
                   random = ~ vsr(usr(Genotype)),
                   data = dataset,
                  verbose = FALSE, dateWarning = FALSE)
mod.bimix5$sigma
## $`u:Genotype`
##          Yield      TKW
## Yield 77.63425 38.83209
## TKW   38.83209 53.86133
## 
## $units
##           Yield       TKW
## Yield 6.0939217 0.1635824
## TKW   0.1635824 4.4718011</pre>
<p>Hope this was useful… should you have any better solutions, I’d be happy to learn them; please, drop me a line at the address below. Thanks for reading and happy coding!</p>
<p>And … don’t forget to check out my new book!</p>
<p>Prof. Andrea Onofri<br />
Department of Agricultural, Food and Environmental Sciences<br />
University of Perugia (Italy)<br />
Send comments to: <a href="mailto:andrea.onofri@unipg.it" rel="nofollow" target="_blank">andrea.onofri@unipg.it</a></p>
<p><a href = "https://www.awin1.com/cread.php?awinmid=26429&#038;awinaffid=2675822&#038;ued=https%3A%2F%2Flink.springer.com%2Fbook%2F10.1007%2F978-3-032-08199-5"><img src="https://i1.wp.com/www.statforbiology.com/_Figures/Email_Signature_978-3-032-08199-5.png?w=578&#038;ssl=1" align="center" alt="Book cover" class="cover" data-recalc-dims="1" /></a></p>
<hr />
</div>
<div id="references" class="section level1">
<h1>References</h1>
<ol style="list-style-type: decimal">
<li>Butler, D., Cullis, B.R., Gilmour, A., Gogel, B., Thomson, R., 2018. ASReml-r reference manual - version 4. UK.</li>
<li>Covarrubias-Pazaran, G., 2016. Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer. PLOS ONE 11, e0156744. <a href="https://doi.org/10.1371/journal.pone.0156744" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1371/journal.pone.0156744</a></li>
<li>Piepho, H.-P., 2018. Allowing for the structure of a designed experiment when estimating and testing trait correlations. The Journal of Agricultural Science 156, 59–70.</li>
<li>Pinheiro, J.C., Bates, D.M., 2000. Mixed-effects models in s and s-plus. Springer-Verlag Inc., New York.</li>
<li>Wickham H, Vaughan D, Girlich M (2024). <em>tidyr: Tidy Messy Data</em>.doi:10.32614/CRAN.package.tidyr <a href="https://doi.org/10.32614/CRAN.package.tidyr" class="uri" rel="nofollow" target="_blank">https://doi.org/10.32614/CRAN.package.tidyr</a>, R package version 1.3.1, <a href="https://cran.r-project.org/package=tidyr" class="uri" rel="nofollow" target="_blank">https://CRAN.R-project.org/package=tidyr</a>.</li>
</ol>
<hr />
<p>This post was originally on 2025-02-10 and updated on 2026-04-15</p>
</div>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.statforbiology.com/2026/correlation/"> R on Fixing the bridge between biologists and statisticians</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/dealing-with-correlation-in-designed-field-experiments-part-ii-4/">Dealing with correlation in designed field experiments: part II</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400601</post-id>	</item>
		<item>
		<title>Programming with LLMs in R &#038; Python</title>
		<link>https://www.r-bloggers.com/2026/04/programming-with-llms-in-r-python/</link>
		
		<dc:creator><![CDATA[The Jumping Rivers Blog]]></dc:creator>
		<pubDate>Tue, 14 Apr 2026 23:59:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.jumpingrivers.com/blog/programming-llms-r-python/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Working with LLMs in Practice<br />
Large Language Models are becoming part of everyday data science work. But using them through chat interfaces is only one part of the picture.<br />
In this upcoming webinar, we focus on how to work with LLMs programmatica...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/programming-with-llms-in-r-python/">Programming with LLMs in R & Python</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.jumpingrivers.com/blog/programming-llms-r-python/"> The Jumping Rivers Blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>
<a href = "https://www.jumpingrivers.com/blog/programming-llms-r-python/">
<img src="https://i0.wp.com/www.jumpingrivers.com/blog/programming-llms-r-python/featured.png?w=400&#038;ssl=1" style="width:400px" class="image-center" style="display: block; margin: auto;" data-recalc-dims="1" />
</a>
</p>
<h2 id="working-with-llms-in-practice">Working with LLMs in Practice</h2>
<p>Large Language Models are becoming part of everyday data science work. But using them through chat interfaces is only one part of the picture.</p>
<p>In this upcoming webinar, we focus on how to work with LLMs programmatically, using R and Python to integrate them into real workflows and applications.</p>
<blockquote>
<p>Secure your place by registering through the <a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="nofollow" target="_blank">webinar registration form</a></p>
</blockquote>
<h2 id="what-well-cover">What We’ll Cover</h2>
<p>We begin with a short introduction to how LLMs work, including how they are priced, where they perform well, and where they can fall short.</p>
<p>From there, the session moves into practical examples of working with LLMs in code:</p>
<ul>
<li>Sending prompts to an LLM API from R using the <a href="https://ellmer.tidyverse.org/" rel="nofollow" target="_blank">{ellmer}</a> package</li>
<li>Including additional instructions through system prompts</li>
<li>Structuring prompts to return clean, tabular outputs</li>
<li>Summarising images and PDFs using LLMs</li>
</ul>
<p>While the examples will focus primarily on R, we will also briefly explore the <a href="https://posit-dev.github.io/chatlas/" rel="nofollow" target="_blank">{chatlas}</a> package for Python, which offers similar functionality.</p>
<h2 id="why-this-matters">Why This Matters</h2>
<p>Using LLMs through chat tools is useful for exploration, but it has limits.</p>
<p>For data scientists and developers, the value comes from:</p>
<ul>
<li>
<p>Automating repetitive tasks</p>
</li>
<li>
<p>Embedding LLMs into applications and pipelines</p>
</li>
<li>
<p>Generating structured outputs that can be reused downstream</p>
</li>
</ul>
<p>This webinar focuses on that shift, from interactive use to integration in code.</p>
<h2 id="who-should-attend">Who Should Attend</h2>
<p>This webinar is suitable for:</p>
<ul>
<li>
<p>Data scientists working with R or Python</p>
</li>
<li>
<p>Developers interested in integrating AI into applications</p>
</li>
<li>
<p>Teams exploring how to move from experimentation to production</p>
</li>
</ul>
<p>No prior experience with LLM APIs is required, but familiarity with R or Python will be helpful.</p>
<h2 id="webinar-details">Webinar Details</h2>
<ul>
<li><strong>Date:</strong> 23rd April 2026</li>
<li><strong>Time:</strong> 1:15 PM (UK time)</li>
<li><strong>Location:</strong> Online</li>
<li><strong>Cost:</strong> Free</li>
</ul>
<h2 id="speaker">Speaker</h2>
<p>The session will be led by <a href="https://www.linkedin.com/in/myles-mitchell-4009aa98/" rel="nofollow" target="_blank">Myles Mitchell</a>, Principal Data Scientist at Jumping Rivers.</p>
<h2 id="related-jumping-rivers-training">Related Jumping Rivers Training</h2>
<p>If you would like to explore these topics further, our 6-hour course, <a href="https://www.jumpingrivers.com/training/course/llm-applications-r-python-shiny-rag-ai/" rel="nofollow" target="_blank">LLM-Driven Applications with R and Python</a> covers:</p>
<ul>
<li>Building LLM-powered dashboards</li>
<li>Setting up retrieval-augmented generation (RAG) pipelines</li>
<li>Responsible use of AI</li>
</ul>
<h2 id="join-us">Join Us</h2>
<p>LLMs are quickly becoming part of the standard toolkit for data science.</p>
<p>Understanding how to use them programmatically opens up far more possibilities than using them through chat alone.</p>
<p>This session is designed to give you a clear starting point.</p>
<aside class="advert">
<p>
Join us for our AI in Production conference! For more details, check out our
<a href="https://ai-in-production.jumpingrivers.com/" rel="nofollow" target="_blank">conference website!</a>
</p>
</aside>
<p>
For updates and revisions to this article, see the <a href = "https://www.jumpingrivers.com/blog/programming-llms-r-python/">original post</a>
</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.jumpingrivers.com/blog/programming-llms-r-python/"> The Jumping Rivers Blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/programming-with-llms-in-r-python/">Programming with LLMs in R & Python</a>]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">400537</post-id>	</item>
		<item>
		<title>Marathon Man II: how to pace a marathon</title>
		<link>https://www.r-bloggers.com/2026/04/marathon-man-ii-how-to-pace-a-marathon/</link>
		
		<dc:creator><![CDATA[Stephen Royle]]></dc:creator>
		<pubDate>Tue, 14 Apr 2026 12:30:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://quantixed.org/?p=3760</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> It’s often the way. I posted recently about how to pace a marathon and very quickly received feedback that would’ve improved the original post. Oh well, no going back. This is take two. So, we have a dataset of all runners from the 2025 New York City Marathon. We ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/marathon-man-ii-how-to-pace-a-marathon/">Marathon Man II: how to pace a marathon</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://quantixed.org/2026/04/14/marathon-man-ii-how-to-pace-a-marathon/"> Rstats – quantixed</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>It’s often the way. I posted recently about <a href="https://quantixed.org/2026/04/06/marathon-man-how-to-pace-a-marathon/" data-type="post" data-id="3654" rel="nofollow" target="_blank">how to pace a marathon</a> and very quickly received feedback that would’ve improved the original post. Oh well, no going back. This is take two.</p>



<p>So, we have a dataset of all runners from the 2025 New York City Marathon. We want to know how should you pace a marathon. <strong>What is the best strategy?</strong></p>



<p>Determining your optimal pace is complex. There’s the theoretical pace that you can achieve – a mix of biomechanics, physiology and training – but it can be very hard to know what this pace is. Anyway, this theoretical pace is what you <em>could</em> achieve when all goes well. You need to factor in the conditions on the day – how you slept, how you fuel, mental attitude, is it windy? can you get in a group and work with others? and so on. A runner may toe the line in the shape to run a sub 3 h marathon, but by the 30 km mark, the story may be very different.</p>



<p>In the last post, we saw that positive splitting (otherwise known as slowing down) is inevitable. So it seems the best strategy is start out faster than your goal pace, bank some time so that you account for the fade.</p>



<p>A reader responded with this insightful comment:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>What I question, though, is whether a (very thorough) analysis of how marathons <em>get</em> run tells us much about how they <em>should be</em> run? This seems to be saying, “Forget about an <em>optimal</em> pace, here’s how to compensate for the <em>sub-optimal</em> pace you’re going to run despite your plans.”</p>
</blockquote>



<p>This is correct. Any <em>post hoc </em>analysis like this can only tells us how the marathon <em>was</em> run, not about how they <em>should</em> be run. This is because we don’t know the intention of any runner in the dataset. If we did, then we would know how a runner intended to run the race (i.e. what their pacing strategy was) and then we could ask: did that work out for them?</p>



<p>If only we knew their intention… hmm…</p>



<h2 class="wp-block-heading">The idea</h2>



<p>The sub-3 marathon is one of the big goals in running. That is, trying to run it in less than 3 hours. So we know that there are a bunch of runners in the dataset trying to do just that. We know the finish times too. So by definition, the runners finishing between 02:55:00 and 03:00:00 were the folks shooting for sub-3 and who achieved it, while those finishing between 03:00:00 and 03:05:00 were those who didn’t make it. Sure, there will be some in this window who were hoping for 02:50:00 and failed and some who were hoping to do 03:10:00 and ran amazingly well. But by narrowing the window to 5 min either side of 3 h, we have fewer of those than if we took 10 min either side.</p>



<p><strong>If we assume that runners in the 02:55:00 to 03:05:00 finishing window intended to run for a finish time of 3 h, we can analyse how they paced the marathon and how it worked out for them.</strong></p>



<p>Analysing this window also has the advantage that runners of this calibre know how to pace well, compared to those trying for 03:30:00 or 04:00:00. There’s also plenty of them too given that it is a popular goal.</p>



<p>So let’s take a look. Plots first and then the <a href="https://quantixed.org/2026/04/14/marathon-man-ii-how-to-pace-a-marathon/#the-code" rel="nofollow" target="_blank">code below</a>.</p>



<h2 class="wp-block-heading">Going for sub-3</h2>



<p>We’ll use difference from goal pace to visualise runners progress. The goal pace here is ~04:15/km. Below 0 is running ahead of pace (banking time) and running above 0 means being behind schedule.</p>



<p>We colour the runners by whether they made it, sub 3 (red) or failed, went over 3 (blue).</p>



<figure class="wp-block-image size-large"><img loading="lazy" fetchpriority="high" decoding="async" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/sub_over_3_comparison-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3761" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/sub_over_3_comparison-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/sub_over_3_comparison-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/sub_over_3_comparison-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/sub_over_3_comparison-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/sub_over_3_comparison-2048x1170.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>



<p>770 runners were sub 3 whereas 628 were over 3. This can be difficult to see, so let’s take a different view.</p>



<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex">
<figure data-wp-context="{"imageId":"69de59868a862"}" data-wp-interactive="core/image" data-wp-key="69de59868a862" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3762" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/over_3_comparison-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3762" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/over_3_comparison-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/over_3_comparison-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/over_3_comparison-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/over_3_comparison-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/over_3_comparison-2048x1170.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<figure data-wp-context="{"imageId":"69de59868ad43"}" data-wp-interactive="core/image" data-wp-key="69de59868ad43" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3763" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/sub_3_comparison-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3763" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/sub_3_comparison-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/sub_3_comparison-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/sub_3_comparison-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/sub_3_comparison-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/sub_3_comparison-2048x1170.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</figure>



<p>For both outcomes we have several runners who were clearly shooting for a faster time, but something went wrong and they ended up in our window. They are appear as U-shapes in the difference plots. Rather than remove them, we’ll accept these contaminants and assume that most folks in this window are shooting for a 3 h finish.</p>



<p>We can see different pacing as the race progresses for different runners. Some folks are behind schedule but end up making sub-3, others are ahead of time and fail. To answer our question we need to know: <strong>what is the best strategy</strong>?</p>



<h3 class="wp-block-heading">On-pace, positive split, negative split?</h3>



<p>We’ll take 10 km as our marker point. It’s almost one-quarter through. Any excitement of the start with all the crowds messing up pacing is done and we can see at this point who is intending to run at what pace. Let’s say that “on pace” is 2 s/km difference from goal pace. So at 10 km, an “on pace” runner could be ± 20 s from where they should be (00:21:20). If the difference is more than 20 s we’ll say they are behind pace, and if it is greater in the other direction, they are ahead of pace.</p>



<p>Knowing this, we can look at the outcome. Of the runners going for 3 h, what was the best strategy?</p>



<figure data-wp-context="{"imageId":"69de59868b478"}" data-wp-interactive="core/image" data-wp-key="69de59868b478" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/pacing_by_final_category-1024x614.png?w=450&#038;ssl=1" alt="" class="wp-image-3764" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/pacing_by_final_category-1024x614.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/pacing_by_final_category-300x180.png 300w, https://quantixed.org/wp-content/uploads/2026/04/pacing_by_final_category-768x461.png 768w, https://quantixed.org/wp-content/uploads/2026/04/pacing_by_final_category-1536x922.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/pacing_by_final_category-2048x1229.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>We can see that <strong>most people do not negative split a sub-3 marathon</strong>. The majority of people making the goal, run the first 10 km (and indeed most of the race) <em>ahead</em> of goal pace.</p>



<p>There’s a risk here though, going out at faster than goal pace means that you might fail. The yellow traces really show how, at 30-35 km, the race gets very tough and people can slow down significantly. Anyone who has run a marathon will tell you that “the race only starts at the 30 km mark”. It’s where people start to hit the wall and this plot really shows that. These folks could have misjudged their theoretical best pace or just struggled on this occasion.</p>



<p>I find the strategy for success interesting. A lot of advice out there is to start out a marathon at an easier pace and speed up if you can. While it’s true you shouldn’t go too fast and blow up, the advice should be to <strong>train to run at more than 2 s ahead of goal pace</strong> and try to maintain that.</p>



<h3 class="wp-block-heading">Tell me the odds</h3>



<p>With all the <a href="https://quantixed.org/2026/04/14/marathon-man-ii-how-to-pace-a-marathon/#caveats" rel="nofollow" target="_blank">caveats in place</a>, let’s try and get some individual-level probabilities from our population-level data.</p>



<p>We looked at the 10 km point, applying a ± 2 s/km threshold for goal pace, and the behind/ahead classifications. We can do this for every waypoint that we have data for. Now, we can say for a given waypoint: of the runners that were say, ahead of pace, how many finished sub-3 (succeeded) and how many were over-3 (failed). This gives us a probability of success for that strategy at that waypoint. We can then plot these probabilities out.</p>



<figure data-wp-context="{"imageId":"69de59868bd50"}" data-wp-interactive="core/image" data-wp-key="69de59868bd50" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/probability_of_success_sub_3-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3765" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/probability_of_success_sub_3-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/probability_of_success_sub_3-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/probability_of_success_sub_3-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/probability_of_success_sub_3-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/probability_of_success_sub_3-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>If your strategy was to go ahead of pace and you were ahead of pace at 5 km, you have a 65% chance of going sub-3. If you are ahead of pace at 30 km, it climbs to a 72% chance. Obviously it keeps climbing to certain success the further the race progresses.</p>



<p>Running at goal pace gives a 50/50 chance of making it if you’re on pace at 5 km. But if you are only on-pace at the halfway point, your chance of success drops to 37%.</p>



<p>If you are behind pace at 10 km, you have a 19% chance of success and this probability drops as the race continues. Eventually, we hit the point where it is not possible to make up the time that’s lost and it is 100% likely that you will fail.</p>



<h2 class="wp-block-heading">The best strategy?</h2>



<p>The best strategy is to go out faster than goal pace and this is what you should train for.</p>



<p>Negative splitting is rare. Slowing down after 30 km is highly likely. Failing to account for this means potentially missing out on your goal.</p>



<p>This message is not too different from the previous post, but we now have some probabilities to back up advice on how the race <em>should</em> be run.</p>



<h2 class="wp-block-heading" id="caveats">Caveats</h2>



<p>Most people running a marathon are first-timers who will run this one race and their goal is to simply finish. Let’s face it, most non-runners have no idea whether your finish time was good/bad/whatever. They will just be impressed that you finished! This post is intended for repeat offenders who strive to improve their time. Maybe the best advice is to just go out there, enjoy running your marathon and not worry about pacing. It’s the best feeling in the world to have achieved it whether it’s your first or fifth.</p>



<p>This analysis is obviously limited to one dataset, the 2025 New York City Marathon. It has a flat profile, so any of the probabilities will likely only apply over a similarly flat course in similar conditions. I also mentioned that we assume a 3 h goal for the runners in the window and we saw how this is not perfect, but it is the best we can do. Obviously, the pacing for other goal times may be different, but we saw in the previous analysis that positive splitting is the most likely scenario regardless of pace.</p>



<h2 class="wp-block-heading" id="the-code">The code</h2>


<pre>
library(ggplot2)
library(ggtext)
library(dplyr)
library(hms)

## plot styling ----

# qBrand plot styling used. This code should run OK without

my_colours &lt;- c(&quot;Sub 3 - Behind&quot; = &quot;#003d5c&quot;,
                &quot;Sub 3 - Goal Pace&quot; = &quot;#954e9b&quot;,
                &quot;Sub 3 - Ahead&quot; = &quot;#ff6b59&quot;,
                &quot;Over 3 - Behind&quot; = &quot;#464c89&quot;,
                &quot;Over 3 - Goal Pace&quot; = &quot;#dd4d88&quot;,
                &quot;Over 3 - Ahead&quot; = &quot;#ffa600&quot;)

my_levels &lt;- c(&quot;Sub 3 - Behind&quot;,
               &quot;Sub 3 - Goal Pace&quot;,
               &quot;Sub 3 - Ahead&quot;,
               &quot;Over 3 - Behind&quot;,
               &quot;Over 3 - Goal Pace&quot;,
               &quot;Over 3 - Ahead&quot;)

## data wrangling ----

# load csv file from url
# url &lt;- paste0(&quot;https://huggingface.co/datasets/donaldye8812/&quot;,
#               &quot;nyc-2025-marathon-splits/resolve/main/&quot;,
#               &quot;nyrr_marathon_2025_summary_56480_runners_WITH_SPLITS.csv&quot;)
# df &lt;- read.csv(url)

# save locally
# write.csv(df, &quot;Output/Data/nyc_marathon_2025_splits.csv&quot;, row.names = FALSE)          

## main script ----

df &lt;- read.csv(&quot;Output/Data/nyc_marathon_2025_splits.csv&quot;)

times_df &lt;- df %&gt;%
  select(RunnerID, splitCode, time)
runners_df &lt;- df %&gt;%
  select(RunnerName, RunnerID, OverallTime, OverallPlace, Gender,
         Age, City, Country, Bib) %&gt;% 
  unique()
runners_df$OverallTime &lt;- as_hms(runners_df$OverallTime)

# unique pairs of splitCode and distance -- and add distance in km
split_distances &lt;- df %&gt;%
  select(splitCode, distance) %&gt;%
  unique()
split_distances$distance &lt;- c(4.83,5.00,6.44,8.05,9.66,10.00,11.27,12.87,14.48,
                              15.00,16.09,17.70,19.31,20.00,20.92,21.08,22.53,
                              24.14,25.00,25.75,27.36,28.97,30.00,30.58,32.19,
                              33.80,35.00,35.41,37.01,38.62,40.00,40.23,41.84,
                              42.20)

# merge split distances with times_df
times_df &lt;- merge(times_df, split_distances, by = &quot;splitCode&quot;, sort = FALSE)

# order the table by RunnerID and then by distance
times_df &lt;- times_df[order(times_df$RunnerID, times_df$distance), ]
row.names(times_df) &lt;- NULL
# time is character, change it
times_df$time &lt;- as_hms(times_df$time)

# make a df of RunnerID, OverallTime, and a new column called Category which is
# &quot;Sub 3&quot; or &quot;Over 3&quot;
category_df &lt;- runners_df %&gt;%
  select(RunnerID, OverallTime) %&gt;% 
  filter(OverallTime &gt; as_hms(&quot;02:55:00&quot;) & OverallTime &lt;= as_hms(&quot;03:05:00&quot;)) %&gt;%
  mutate(Category = ifelse(OverallTime &lt;= as_hms(&quot;03:00:00&quot;), &quot;Sub 3&quot;, &quot;Over 3&quot;))

# merge category_df with times_df to get the pace for each runner in each
# category and drop any rows with NA values
times_df &lt;- merge(times_df, category_df,
                  by = &quot;RunnerID&quot;, all.x = TRUE, sort = FALSE) %&gt;%
  filter(!is.na(Category)) %&gt;%
  mutate(on_par = time - (as_hms(&quot;03:00:00&quot;) /42.19 * distance))

ggplot(times_df, aes(x = distance, y = on_par, group = RunnerID, color = Category)) +
  geom_abline(slope = 0, intercept = 0, linetype = &quot;dashed&quot;, color = &quot;black&quot;) +
  geom_line(alpha = 0.2) +
  scale_color_manual(values = c(&quot;Sub 3&quot; = &quot;#ff6b59&quot;, &quot;Over 3&quot; = &quot;#464c89&quot;)) +
  labs(title = &quot;Difference from Goal Pace for Sub-3 and Over-3 Runners in NYC Marathon 2025&quot;,
       x = &quot;Distance (km)&quot;,
       y = &quot;Difference from Goal Pace (seconds)&quot;,
       color = &quot;Category&quot;) +
  theme_q() +
  guides(colour = guide_legend(override.aes = list(alpha = 1)))
ggsave(&quot;Output/Plots/sub_over_3_comparison.png&quot;, width = 7, height = 4, dpi = 300)

ggplot() +
  geom_abline(slope = 0, intercept = 0, linetype = &quot;dashed&quot;, color = &quot;black&quot;) +
  geom_line(data = times_df %&gt;%
              filter(Category == &quot;Sub 3&quot;),
            aes(x = distance, y = on_par, group = RunnerID),
            color = &quot;grey&quot;, alpha = 0.2) +
  geom_line(data = times_df %&gt;%
              filter(Category == &quot;Over 3&quot;),
            aes(x = distance, y = on_par, group = RunnerID),
            color = &quot;#464c89&quot;, alpha = 0.2) +
  labs(title = &quot;Over-3 Runners in NYC Marathon 2025&quot;,
       x = &quot;Distance (km)&quot;,
       y = &quot;Difference from Goal Pace (seconds)&quot;) +
  theme_q()
ggsave(&quot;Output/Plots/over_3_comparison.png&quot;, width = 7, height = 4, dpi = 300)

ggplot() +
  geom_abline(slope = 0, intercept = 0, linetype = &quot;dashed&quot;, color = &quot;black&quot;) +
  geom_line(data = times_df %&gt;% filter(Category == &quot;Over 3&quot;),
            aes(x = distance, y = on_par, group = RunnerID),
            color = &quot;grey&quot;, alpha = 0.2) +
  geom_line(data = times_df %&gt;% filter(Category == &quot;Sub 3&quot;),
            aes(x = distance, y = on_par, group = RunnerID),
            color = &quot;#ff6b59&quot;, alpha = 0.2) +
  labs(title = &quot;Sub-3 Runners in NYC Marathon 2025&quot;,
       x = &quot;Distance (km)&quot;,
       y = &quot;Difference from Goal Pace (seconds)&quot;) +
  theme_q()
ggsave(&quot;Output/Plots/sub_3_comparison.png&quot;, width = 7, height = 4, dpi = 300)

# classify on_par into three categories: &quot;Ahead of Par&quot; for values less than
# -20, &quot;On Par&quot; for values between -20 and 20, and &quot;Behind Par&quot; for values
# greater than 20 at the 10K mark, i.e. 2 seconds per km * 10 km = 20 seconds
class_df &lt;- times_df %&gt;%
  mutate(par_category = case_when(
    distance == 10 & on_par &lt; -20 ~ &quot;Ahead&quot;,
    distance == 10 &#038; on_par &gt;= -20 & on_par &lt;= 20 ~ &quot;Goal Pace&quot;,
    distance == 10 &#038; on_par &gt; 20 ~ &quot;Behind&quot;,
    TRUE ~ NA_character_
  )) %&gt;% 
  filter(!is.na(par_category))
# paste Category and par_category together to make a new column called final_category
class_df &lt;- class_df %&gt;%
  mutate(final_category = paste(Category, par_category, sep = &quot; - &quot;)) %&gt;% 
  select(RunnerID, final_category)
# merge class_df with times_df to get the final_category for each runner in each category and drop any rows with NA values
times_df &lt;- merge(times_df, class_df, by = &quot;RunnerID&quot;, all.x = TRUE) %&gt;%
  filter(!is.na(final_category))
# use my_levels to get facets in the order of my_level
times_df$final_category &lt;- factor(times_df$final_category, levels = my_levels)
# ggplot of on_par by distance colored by final_category
ggplot(times_df, aes(x = distance, y = on_par, group = RunnerID, color = final_category)) +
  geom_abline(slope = 0, intercept = 0, linetype = &quot;dashed&quot;, color = &quot;black&quot;) +
  geom_line(alpha = 0.2) +
  scale_color_manual(values = my_colours) +
  labs(title = &quot;Pacing at 10 km and Overall Outcome&quot;,
       x = &quot;Distance (km)&quot;,
       y = &quot;Difference from Par Time (seconds)&quot;,
       color = &quot;Category&quot;) +
  theme(legend.position = &quot;none&quot;) +
  facet_wrap(~ final_category) +
  theme_q() +
  guides(colour = guide_legend(override.aes = list(alpha = 1)))
ggsave(&quot;Output/Plots/pacing_by_final_category.png&quot;, width = 10, height = 6, dpi = 300)


# calculate the probability of success

# list of unique distances in numerical order
distance_list &lt;- sort(unique(times_df$distance))

all_p_df &lt;- tibble()
for(i in 1:length(distance_list)) {
  dist &lt;- distance_list[i]
  par &lt;- as_hms(&quot;00:00:02&quot;) * dist
  class_df &lt;- times_df %&gt;%
    mutate(par_category = case_when(
      distance == dist & on_par &lt; -par ~ &quot;Ahead&quot;,
      distance == dist &#038; on_par &gt;= -par & on_par &lt;= par ~ &quot;Goal Pace&quot;,
      distance == dist &#038; on_par &gt; par ~ &quot;Behind&quot;,
      TRUE ~ NA_character_
    )) %&gt;% 
    filter(!is.na(par_category)) %&gt;%
    select(RunnerID, Category, par_category) %&gt;%
    group_by(Category, par_category) %&gt;%
    summarise(count = n()) %&gt;%
    group_by(par_category) %&gt;%
    mutate(percentage = count / sum(count) * 100) %&gt;%
    ungroup() %&gt;% 
    mutate(distance = dist) %&gt;% 
    select(distance, Category, par_category, percentage)
  all_p_df &lt;- rbind(all_p_df, class_df)
}

all_p_df$final_category &lt;- paste(all_p_df$Category, all_p_df$par_category, sep = &quot; - &quot;)
all_p_df$final_category &lt;- factor(all_p_df$final_category, levels = my_levels)
all_p_df %&gt;% 
  filter(grepl(&quot;^Sub 3&quot;, final_category)) %&gt;% 
  ggplot(aes(x = distance, y = percentage, colour = final_category)) +
  geom_line() +
  scale_color_manual(values = my_colours) +
  labs(title = &quot;Probability of Success for Pacing Strategies by Distance&quot;,
       x = &quot;Distance (km)&quot;,
       y = &quot;Probability of Sub-3 (%)&quot;,
       color = &quot;Category&quot;) +
  theme_q()
ggsave(&quot;Output/Plots/probability_of_success_sub_3.png&quot;, width = 7, height = 4, dpi = 300)

</pre>


<p>—</p>



<p>The post title comes from “Marathon Man” by Ian Brown from his “My Way” album.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://quantixed.org/2026/04/14/marathon-man-ii-how-to-pace-a-marathon/"> Rstats – quantixed</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/marathon-man-ii-how-to-pace-a-marathon/">Marathon Man II: how to pace a marathon</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400545</post-id>	</item>
		<item>
		<title>Do AI coding agents save scientists time?</title>
		<link>https://www.r-bloggers.com/2026/04/do-ai-coding-agents-save-scientists-time/</link>
		
		<dc:creator><![CDATA[Seascapemodels]]></dc:creator>
		<pubDate>Mon, 13 Apr 2026 14:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.seascapemodels.org/posts/2026-04-14-does-agentic-AI-save-time/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>I’m often asked if using AI coding agents saves time. Yes they write code very quickly and can complete entire ecological data analyses.</p>
<p>Do agents really help when the deadlines are approaching?</p>
<p>Do agents really help when the deadlines are ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/do-ai-coding-agents-save-scientists-time/">Do AI coding agents save scientists time?</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.seascapemodels.org/posts/2026-04-14-does-agentic-AI-save-time/"> Seascapemodels</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p>I’m often asked if using AI coding agents saves time. Yes they write code very quickly and can <a href="https://onlinelibrary.wiley.com/doi/10.1111/faf.70079" rel="nofollow" target="_blank">complete entire ecological data analyses</a>.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i2.wp.com/www.seascapemodels.org/posts/2026-04-14-does-agentic-AI-save-time/coding-time.png?w=578&#038;ssl=1" class="img-fluid figure-img" data-recalc-dims="1"></p>
<figcaption>Do agents really help when the deadlines are approaching?</figcaption>
</figure>
</div>
<p>Do agents really help when the deadlines are approaching?</p>
<p>But the code also requires careful checking for logical errors. Our recent analysis shows this. The best LLMs can complete entire analyses and all the code works well. But there was a decent chance of subtle logical errors. These logical errors would require pretty deep human understanding of the code to correct.</p>
<p>There’s another issue and that is using code you don’t understand. I often find the agents produce so much code, but I’m not comfortable using it until I understand the logic line-by-line.</p>
<p>In those cases I find its faster to use an autocomplete AI assistant so I’m going line-by-line, rather than an agentic loop that completes the entire piece of work.</p>
<p>I think the jury is still out on this question of whether there is a net time benefit to using agents. The only way to really answer is a randomised control trial where you time how long it takes scientists to fully complete a task.</p>
<p>The only study I’m aware of is quite limited and looked at software developers. They found the <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/" rel="nofollow" target="_blank">developers often projected they would be faster with the AI tools, but were actually slower at tasks by the end of the project.</a></p>
<p>Its true that using AI is fun because it makes so much progress, but that fun feeling might be a trap for us.</p>
<p>Its likely that the answer is context dependent.</p>
<p>I suspect for scientists most of the coding we do (like writing models that represent ecosystems) actually requires the human to understand what it does. In these cases agents don’t make sense, because you need to go back and review the code carefully to understand it anyway.</p>
<p>On the other hand, if you are making software tools that are easy to verify then agents are great. For instance, I often use them to write code for non-standard figures. I don’t need to know the code in that case because I can check the output is correct visually.</p>
<p>Likewise interactive shiny apps are another example of time saving. The agent can take some (good) code you already have and turn it into an app. Its easy to test and check because you just use the app.</p>
<p>People often point to advances in LLMs and say that soon they will be good enough to do all the coding for us. I’m not so sure that applies to science. <a href="https://onlinelibrary.wiley.com/doi/10.1111/faf.70079" rel="nofollow" target="_blank">In fact, we found the later version of Claude Sonnet performed about the same as an earlier version on scientific logic, it just made different types of errors.</a></p>
<p>I think the advances need to come in the ways we interact and use the LLMs.</p>
<p>The ultimately goal should be efficient but also high quality work. That’s something I want to look at in my next agentic AI study.</p>



 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.seascapemodels.org/posts/2026-04-14-does-agentic-AI-save-time/"> Seascapemodels</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/do-ai-coding-agents-save-scientists-time/">Do AI coding agents save scientists time?</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400543</post-id>	</item>
		<item>
		<title>reviser: Analyzing Real-Time Data Revisions in R</title>
		<link>https://www.r-bloggers.com/2026/04/reviser-analyzing-real-time-data-revisions-in-r/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Mon, 13 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/04/13/reviser/</guid>

					<description><![CDATA[<p>Economic data are rarely static.<br />
Gross domestic product (GDP), inflation, employment, and other official statistics arrive as early estimates, then get revised as new source data arrive, seasonal adjustment is updated, or benchmarking changes are appl...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/reviser-analyzing-real-time-data-revisions-in-r/">reviser: Analyzing Real-Time Data Revisions in R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/04/13/reviser/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>Economic data are rarely static.
Gross domestic product (GDP), inflation, employment, and other official statistics arrive as early estimates, then get revised as new source data arrive, seasonal adjustment is updated, or benchmarking changes are applied.
Those revisions matter because they can change the narrative around turning points, policy mistakes, and forecast performance.</p>
<p><a href="https://docs.ropensci.org/reviser/" rel="nofollow" target="_blank"><code>reviser</code></a> is an R package by Marc Burri and Philipp Wegmüller for working with these vintage datasets directly.
A vintage dataset records multiple published versions of the same time series, so you can compare what was known at each release date with what was reported later.
<code>reviser</code> gives you a consistent workflow to:</p>
<ul>
<li>reshape release vintages between wide and tidy formats;</li>
<li>extract revisions relative to earlier or final releases;</li>
<li>summarize bias, dispersion, and serial dependence in revisions;</li>
<li>identify the first release that is statistically close to the eventual benchmark;</li>
<li>nowcast future revisions with state-space models.</li>
</ul>
<p>The package is aimed at users who already work with real-time macroeconomic data and want tools that go beyond plotting revision triangles by hand.
One design goal is to keep that workflow in pure R.</p>
<p><code>reviser</code> was reviewed through <a href="https://github.com/ropensci/software-review/issues/709" rel="nofollow" target="_blank">rOpenSci statistical software peer review</a>.
Many thanks to reviewers <a href="https://github.com/AlexGibberd" rel="nofollow" target="_blank">Alex Gibberd</a>, and <a href="https://github.com/TanguyBarthelemy" rel="nofollow" target="_blank">Tanguy Barthelemy</a>, and to editor <a href="https://github.com/rkillick" rel="nofollow" target="_blank">Rebecca Killick</a>, for feedback that improved the package.</p>
<h2>
Why revision analysis deserves its own workflow
</h2><p>Revisions are not just measurement noise.
They encode how information enters the data-production process.</p>
<ul>
<li>Some revisions reflect genuinely new information.</li>
<li>Others reflect noise that could, in principle, have been reduced earlier.</li>
<li>Still others come from methodological changes or benchmark updates.</li>
</ul>
<p>These distinctions matter if you are evaluating early data releases, building nowcasts, or asking whether first releases are already efficient summaries of the available information.</p>
<p>The <code>reviser</code> vignettes organize this workflow into three layers:</p>
<ol>
<li>Structure vintages consistently.</li>
<li>Measure and test revision properties.</li>
<li>Model the revision process when you want to predict future changes.</li>
</ol>
<h2>
A compact example with GDP vintages
</h2><p>The first step is to reshape the data into a tidy vintage format, where each row corresponds to an observed value, the date it refers to, and the publication date of that estimate.</p>
<p>The package ships with a GDP example dataset in long vintage format.
Suppose we want to focus on U.S. GDP growth, visualize how estimates moved during the 2008-09 global financial crisis, and then ask whether early releases were systematically biased relative to a later benchmark.</p>
<pre>library(reviser)
library(dplyr)
library(tsbox)

gdp_us &lt;- gdp |&gt;
 filter(id == &quot;US&quot;) |&gt;
 tsbox::ts_pc() |&gt;
 tsbox::ts_span(start = &quot;1980-01-01&quot;)

gdp_wide &lt;- vintages_wide(gdp_us)
gdp_long &lt;- vintages_long(gdp_wide, keep_na = FALSE)
</pre><p>With the vintages in tidy form, we can plot how the published path changed over time.
The y-axis in the figure reports quarter-on-quarter GDP growth rates.</p>
<pre>plot_vintages(
 gdp_long |&gt;
 filter(
 pub_date &gt;= as.Date(&quot;2009-01-01&quot;),
 pub_date &lt; as.Date(&quot;2010-01-01&quot;),
 time &gt; as.Date(&quot;2008-01-01&quot;),
 time &lt; as.Date(&quot;2010-01-01&quot;)
 ),
 type = &quot;line&quot;,
 title = &quot;Revisions of GDP during the 2008-09 global financial crisis&quot;,
 ylab = &quot;Quarter-on-quarter GDP growth rate&quot;
)
</pre><figure><img src="https://ropensci.org/blog/2026/04/13/reviser/gdp-example-plot-1.svg"
alt="Multiple vintage paths for U.S. GDP growth, highlighting how estimates published in 2009 changed over time." width="450"><figcaption>
<p>GDP growth vintages for the United States during the 2008-09 global financial crisis.</p>
</figcaption>
</figure>
<p>During volatile periods, the vintage paths can diverge enough that the story told by the first release is noticeably different from the story told a year later.</p>
<p>Once the data are in tidy vintage form, you can compare a set of early releases to a later benchmark release.</p>
<pre>final_release &lt;- get_nth_release(gdp_long, n = 10)
early_releases &lt;- get_nth_release(gdp_long, n = 0:6)

summary_tbl &lt;- get_revision_analysis(early_releases, final_release)

Warning: Both &#39;release&#39; and &#39;pub_date&#39; columns are present in &#39;df.
The &#39;release&#39; column will be used for grouping.

summary_tbl |&gt;
 select(id, release, `Bias (mean)`, `Bias (robust p-value)`, `Noise/Signal`)


=== Revision Analysis Summary ===
# A tibble: 7 × 5
id release `Bias (mean)` `Bias (robust p-value)` `Noise/Signal`
&lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 US release_0 -0.014 0.52 0.22
2 US release_1 -0.015 0.425 0.202
3 US release_2 -0.013 0.507 0.205
4 US release_3 -0.003 0.851 0.194
5 US release_4 -0.014 0.326 0.157
6 US release_5 -0.021 0.181 0.152
7 US release_6 -0.018 0.202 0.13
=== Interpretation ===
id=US, release=release_0:
• No significant bias detected (p = 0.52 )
• Moderate revision volatility (Noise/Signal = 0.22 )
id=US, release=release_1:
• No significant bias detected (p = 0.425 )
• Moderate revision volatility (Noise/Signal = 0.202 )
id=US, release=release_2:
• No significant bias detected (p = 0.507 )
• Moderate revision volatility (Noise/Signal = 0.205 )
id=US, release=release_3:
• No significant bias detected (p = 0.851 )
• Moderate revision volatility (Noise/Signal = 0.194 )
id=US, release=release_4:
• No significant bias detected (p = 0.326 )
• Moderate revision volatility (Noise/Signal = 0.157 )
id=US, release=release_5:
• No significant bias detected (p = 0.181 )
• Moderate revision volatility (Noise/Signal = 0.152 )
id=US, release=release_6:
• No significant bias detected (p = 0.202 )
• Moderate revision volatility (Noise/Signal = 0.13 )
</pre><p>This is where <code>reviser</code> moves beyond a reshape-and-plot package.
The revision summary reports quantities that applied work often needs but usually rebuilds ad hoc: mean bias, quantiles, volatility, noise-to-signal ratios, and hypothesis tests for bias, serial correlation, and news-versus-noise interpretations.</p>
<p>In the bundled example, the early U.S. GDP releases over this sample show little evidence of systematic bias relative to the later benchmark.
The package also supports efficient-release diagnostics, where the question is not only whether revisions exist, but when additional revisions stop adding meaningful information.</p>
<pre>efficient_release &lt;- get_first_efficient_release(early_releases, final_release)
summary(efficient_release)

Efficient release: 0
Model summary:
Call:
stats::lm(formula = formula, data = df_wide)
Residuals:
Min 1Q Median 3Q Max
-0.89186 -0.12669 0.02046 0.11475 0.97986
Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) 0.00299 0.02223 0.134 0.893
release_0 0.97412 0.01692 57.567 &lt;2e-16 ***
---
Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
Residual standard error: 0.2518 on 166 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.9523, Adjusted R-squared: 0.952
F-statistic: 3314 on 1 and 166 DF, p-value: &lt; 2.2e-16
Test summary:
Linear hypothesis test:
(Intercept) = 0
release_0 = 1
Model 1: restricted model
Model 2: final ~ release_0
Note: Coefficient covariance matrix supplied.
Res.Df Df F Pr(&gt;F)
1 168
2 166 2 1.9283 0.1486
</pre><p>That is exactly the kind of result that is hard to see from a revision triangle alone but straightforward to formalize once the workflow is standardized.
In this sample, the result points to the first release as already being statistically close to the later benchmark, which suggests subsequent revisions add relatively little systematic information.</p>
<h2>
From descriptive analysis to revision nowcasting
</h2><p>For many users, revision summaries will be the main use case.
But <code>reviser</code> also includes model-based tools for users who want to treat revisions as an explicit latent-data problem.
That matters if you need to make decisions on preliminary data but also want a structured way to estimate how those figures are likely to change later.</p>
<p>Two vignettes walk through nowcasting revisions with:</p>
<ul>
<li>the generalized Kishor-Koenig family via <code>kk_nowcast()</code>;</li>
<li>the Jacobs-Van Norden model via <code>jvn_nowcast()</code>.</li>
</ul>
<p>Both approaches cast the revision problem into state-space form, which makes it possible to estimate the dynamics of news and noise in successive data releases.
For technical users, this is the part of the package that turns revision analysis from retrospective diagnosis into a forecasting problem.</p>
<p>Here is a compact <code>kk_nowcast()</code> example following the Kishor-Koenig workflow from the <a href="https://docs.ropensci.org/reviser/articles/nowcasting-revisions-kk.html" rel="nofollow" target="_blank">vignette</a> for the Euro Area (EA).<br>
The key idea is to first identify an efficient release <code>e</code>, then estimate the revision system on the corresponding panel of releases.
In this euro area example, the efficient-release step selects <code>e = 2</code>, so the model treats the third published release as the earliest one that is already close to the later benchmark.
That is a useful substantive result on its own: it suggests that most of the economically relevant signal arrives within the first few releases, while later revisions are smaller adjustments around that path.</p>
<pre>gdp_ea &lt;- reviser::gdp |&gt;
 tsbox::ts_pc() |&gt;
 dplyr::filter(
 id == &quot;EA&quot;,
 time &gt;= min(pub_date),
 time &lt;= as.Date(&quot;2020-01-01&quot;)
 )

releases &lt;- get_nth_release(gdp_ea, n = 0:14)
final_release &lt;- get_nth_release(gdp_ea, n = 15)

efficient_release &lt;- get_first_efficient_release(releases, final_release)

fit_kk &lt;- kk_nowcast(
 df = efficient_release$data,
 e = efficient_release$e,
 model = &quot;KK&quot;,
 method = &quot;MLE&quot;
)

summary(fit_kk)


=== Kishor-Koenig Model ===
Convergence: Success
Log-likelihood: 125.7
AIC: -231.41
BIC: -198.23
Parameter Estimates:
Parameter Estimate Std.Error
F0 0.633 0.131
G0_0 0.950 0.031
G0_1 -0.037 0.152
G0_2 -0.181 0.220
G1_0 -0.009 0.011
G1_1 0.594 0.061
G1_2 0.194 0.092
v0 0.380 0.068
eps0 0.008 0.001
eps1 0.001 0.000

plot(fit_kk)
</pre><figure><img src="https://ropensci.org/blog/2026/04/13/reviser/kk-nowcast-example-1.svg"
alt="Diagnostic plot from a Kishor-Koenig nowcast model for euro area GDP revisions, summarizing the fitted revision dynamics." width="450"><figcaption>
<p>Diagnostic plot from the Kishor-Koenig nowcast example.</p>
</figcaption>
</figure>
<p>The fitted object contains estimated parameters, filtered and smoothed latent states, and plotting methods for the implied efficient-release path.
That gives you a direct route from descriptive revision analysis to a state-space nowcast of future revisions.
For a broader audience, the main takeaway is not the individual coefficients.
It is that the model converges cleanly on this sample, summarizes the revision process in a compact latent-state form, and provides a practical way to judge whether a new release is likely to be revised materially later on.
Substantively, the model separates persistent signal from transitory revision noise, so the output is useful when you want to judge whether new releases are likely to be revised materially.</p>
<h2>
What reviser adds
</h2><p>What stands out in <code>reviser</code> is not a single function, but the coherence of the workflow:</p>
<ul>
<li>the package has explicit conventions for vintage data;</li>
<li>descriptive revision analysis and formal testing sit in the same API;</li>
<li>efficient-release analysis connects directly to applied questions about which release to trust;</li>
<li>nowcasting tools extend the same workflow rather than forcing a separate modeling stack.</li>
</ul>
<p>If you work with real-time macroeconomic data, that combination is useful because revision analysis is usually fragmented across custom scripts, one-off spreadsheets, and package combinations that do not share a common data structure.</p>
<h2>
Try it and push it further
</h2><p>You can install the package from the rOpenSci R-universe:</p>
<pre>install.packages(
 &quot;reviser&quot;,
 repos = c(&quot;https://ropensci.r-universe.dev&quot;, &quot;https://cloud.r-project.org&quot;)
)
</pre><p>Then start with the package site and vignettes:</p>
<ul>
<li>docs: <a href="https://docs.ropensci.org/reviser" rel="nofollow" target="_blank">https://docs.ropensci.org/reviser</a></li>
<li>source: <a href="https://github.com/ropensci/reviser" rel="nofollow" target="_blank">https://github.com/ropensci/reviser</a></li>
</ul>
<p>We would be happy to hear feedback from those of you trying out the package with different datasets.
If you have a real-time dataset with a different release structure, that would be a good stress test for the package.
If you find gaps in the workflow or have a use case to share, open an issue or contribute an example.
Revision analysis gets more useful as it becomes easier to compare workflows across datasets rather than rebuilding them from scratch each time.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/04/13/reviser/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/reviser-analyzing-real-time-data-revisions-in-r/">reviser: Analyzing Real-Time Data Revisions in R</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400526</post-id>	</item>
		<item>
		<title>`mlS3` — A Unified S3 Machine Learning Interface in R</title>
		<link>https://www.r-bloggers.com/2026/04/mls3-a-unified-s3-machine-learning-interface-in-r/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Sun, 12 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/04/12/r/intro-mlS3</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Introduction to `mlS3` — A Unified S3 Machine Learning Interface in R</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/mls3-a-unified-s3-machine-learning-interface-in-r/">`mlS3` — A Unified S3 Machine Learning Interface in R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/04/12/r/intro-mlS3"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<h2 id="overview">Overview</h2>

<p>Following the <a href="https://thierrymoudiki.github.io/blog/2026/04/04/r/more-unifiedml-classifiers" rel="nofollow" target="_blank">R6 object-based package <code>unifiedml</code> introduced last week</a>, this blog post introduces the <a href="https://github.com/Techtonique/mlS3" rel="nofollow" target="_blank"><code>mlS3</code></a> R package, which strives to provide a <strong>unified, consistent <a href="https://adv-r.hadley.nz/s3.html" rel="nofollow" target="_blank">S3 interface</a></strong> for training and predicting from a variety of popular machine learning models. Rather than learning the idiosyncratic API of each package (you’d still need to read their docs to see the parameters specification, though), <code>mlS3</code> wraps them under a common <code>wrap_*</code> / <code>predict()</code> pattern.</p>

<h2 id="what-youll-learn">What You’ll Learn</h2>

<ul>
  <li>How to install and load <code>mlS3</code> (for now, from GitHub)</li>
  <li>How to apply a consistent API across multiple ML algorithms for both <strong>classification</strong> and <strong>regression</strong> tasks</li>
</ul>

<h2 id="models-covered">Models Covered</h2>

<table>
  <thead>
    <tr>
      <th>Wrapper</th>
      <th>Underlying Package</th>
      <th>Task(s)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>wrap_glmnet()</code></td>
      <td><code>glmnet</code> generalized linear models</td>
      <td>Classification, Regression</td>
    </tr>
    <tr>
      <td><code>wrap_lightgbm()</code></td>
      <td><code>lightgbm</code> gradient boosting</td>
      <td>Classification, Regression</td>
    </tr>
    <tr>
      <td><code>wrap_ranger()</code></td>
      <td><code>ranger</code> random forest</td>
      <td>Classification, Regression</td>
    </tr>
    <tr>
      <td><code>wrap_svm()</code></td>
      <td><code>e1071</code> support vector machines</td>
      <td>Classification, Regression</td>
    </tr>
    <tr>
      <td><code>wrap_caret()</code></td>
      <td><code>caret</code> package</td>
      <td>Classification, Regression with caret <strong>200+ models</strong></td>
    </tr>
  </tbody>
</table>

<h2 id="datasets-used">Datasets Used</h2>

<ul>
  <li><strong><code>iris</code></strong> — binary and multiclass classification (setosa/versicolor, all three species)</li>
  <li><strong><code>mtcars</code></strong> — regression to predict miles per gallon (<code>mpg</code>)</li>
</ul>

<h2 id="key-design-principle">Key Design Principle</h2>

<p>All models follow the same two-step workflow:</p>

<pre>mod  &lt;- wrap_*(X_train, y_train, ...)       # Train
pred &lt;- predict(mod, newx = X_test, ...)    # Predict
</pre>
<p>This makes it easy to swap algorithms and compare results without rewriting your pipeline.</p>

<h2 id="prerequisites">Prerequisites</h2>

<ul>
  <li>R with the following packages: <code>remotes</code>, <code>caret</code>, <code>randomForest</code>, <code>ggplot2</code></li>
  <li><code>mlS3</code> installed from GitHub (for now) via <code>remotes::install_github(&quot;Techtonique/mlS3&quot;)</code></li>
</ul>

<h2 id="code">Code</h2>

<h3 id="install-packages">Install packages</h3>

<pre>install.packages(c(&quot;remotes&quot;))

install.packages(c(&quot;caret&quot;))

install.packages(c(&quot;randomForest&quot;))

remotes::install_github(&quot;Techtonique/mlS3&quot;)
</pre>

<h3 id="predefined-wrappers">Predefined wrappers</h3>

<pre># Classification

library(mlS3)

# =============================================================================
# Classification examples (no leakage)
# =============================================================================
set.seed(123)

# --- Binary classification: iris setosa vs versicolor ---
iris_bin &lt;- iris[iris$Species != &quot;virginica&quot;, ]
X_bin &lt;- iris_bin[, 1:4]
y_bin &lt;- droplevels(iris_bin$Species)

# Split into train/test
idx_bin &lt;- sample(nrow(X_bin), 0.7 * nrow(X_bin))
X_bin_train &lt;- X_bin[idx_bin, ]
y_bin_train &lt;- y_bin[idx_bin]
X_bin_test  &lt;- X_bin[-idx_bin, ]
y_bin_test  &lt;- y_bin[-idx_bin]

# glmnet
mod &lt;- wrap_glmnet(X_bin_train, y_bin_train, family = &quot;binomial&quot;)
pred_bin_glmnet &lt;- predict(mod, newx = X_bin_test, type = &quot;class&quot;)
acc_glmnet &lt;- mean(pred_bin_glmnet == y_bin_test)

cat(&quot;Accuracy (glmnet): &quot;, acc_glmnet, &quot;\n&quot;)


# --- Multiclass classification: iris all species ---
X_multi &lt;- iris[, 1:4]
y_multi &lt;- iris$Species

# Split into train/test
idx_multi &lt;- sample(nrow(X_multi), 0.7 * nrow(X_multi))
X_multi_train &lt;- X_multi[idx_multi, ]
y_multi_train &lt;- y_multi[idx_multi]
X_multi_test  &lt;- X_multi[-idx_multi, ]
y_multi_test  &lt;- y_multi[-idx_multi]

# lightgbm
mod &lt;- wrap_lightgbm(X_multi_train, y_multi_train,
                     params = list(objective = &quot;multiclass&quot;,
                                   num_class = 3, verbose = -1),
                     nrounds = 150)
pred_multi_lightgbm &lt;- predict(mod, newx = X_multi_test, type = &quot;class&quot;)
acc_lightgbm &lt;- mean(pred_multi_lightgbm == y_multi_test)

# ranger
mod &lt;- wrap_ranger(X_multi_train, y_multi_train, num.trees = 100L)
pred_multi_ranger &lt;- predict(mod, newx = X_multi_test, type = &quot;class&quot;)
acc_ranger &lt;- mean(pred_multi_ranger == y_multi_test)

# svm
mod &lt;- wrap_svm(X_multi_train, y_multi_train, kernel = &quot;radial&quot;)
pred_multi_svm &lt;- predict(mod, newx = X_multi_test, type = &quot;class&quot;)
acc_svm &lt;- mean(pred_multi_svm == y_multi_test)

cat(&quot;Accuracy (lightgbm): &quot;, acc_lightgbm, &quot;\n&quot;)
cat(&quot;Accuracy (ranger): &quot;, acc_ranger, &quot;\n&quot;)
cat(&quot;Accuracy (svm): &quot;, acc_svm, &quot;\n&quot;)


# Regression


# =============================================================================
# Regression examples (mtcars)
# =============================================================================
X_reg &lt;- mtcars[, -1]
y_reg &lt;- mtcars$mpg

# Split into train/test
set.seed(123)
idx_reg &lt;- sample(nrow(X_reg), 0.7 * nrow(X_reg))
X_reg_train &lt;- X_reg[idx_reg, ];  y_reg_train &lt;- y_reg[idx_reg]
X_reg_test  &lt;- X_reg[-idx_reg, ]; y_reg_test  &lt;- y_reg[-idx_reg]

# lightgbm
mod &lt;- wrap_lightgbm(X_reg_train, y_reg_train,
                     params = list(objective = &quot;regression&quot;, verbose = -1),
                     nrounds = 50)
pred_reg_lightgbm &lt;- predict(mod, newx = X_reg_test)
rmse_lightgbm &lt;- sqrt(mean((pred_reg_lightgbm - y_reg_test)^2))

# glmnet
mod &lt;- wrap_glmnet(X_reg_train, y_reg_train, alpha = 0)
pred_reg_glmnet &lt;- predict(mod, newx = X_reg_test)
rmse_glmnet &lt;- sqrt(mean((pred_reg_glmnet - y_reg_test)^2))

# svm
mod &lt;- wrap_svm(X_reg_train, y_reg_train)
pred_reg_svm &lt;- predict(mod, newx = X_reg_test)
rmse_svm &lt;- sqrt(mean((pred_reg_svm - y_reg_test)^2))

# ranger
mod &lt;- wrap_ranger(X_reg_train, y_reg_train, num.trees = 100L)
pred_reg_ranger &lt;- predict(mod, newx = X_reg_test)
rmse_ranger &lt;- sqrt(mean((pred_reg_ranger - y_reg_test)^2))

cat(&quot;RMSE (lightgbm): &quot;, rmse_lightgbm, &quot;\n&quot;)
cat(&quot;RMSE (glmnet): &quot;, rmse_glmnet, &quot;\n&quot;)
cat(&quot;RMSE (svm): &quot;, rmse_svm, &quot;\n&quot;)
cat(&quot;RMSE (ranger): &quot;, rmse_ranger, &quot;\n&quot;)



Accuracy (glmnet):  1 
Accuracy (lightgbm):  0.2444444  # I'm probably doing something wrong here
Accuracy (ranger):  0.9333333 
Accuracy (svm):  0.9333333 
RMSE (lightgbm):  4.713678 
RMSE (glmnet):  2.972557 
RMSE (svm):  2.275837 
RMSE (ranger):  2.067692 
</pre>

<h3 id="caret-wrapper"><code>caret</code> wrapper</h3>

<p>For this part, you need to install package <code>caret</code> and <code>randomForest</code>. Model parameters available for <code>caret</code>: <a href="https://topepo.github.io/caret/available-models.html" rel="nofollow" target="_blank">https://topepo.github.io/caret/available-models.html</a></p>

<pre>library(mlS3)
library(caret)

# ============================================================================
# Regression with mtcars dataset
# ============================================================================
data(mtcars)

# Prepare data
X_reg &lt;- mtcars[, -1]  # All except mpg
y_reg &lt;- mtcars$mpg     # Target variable

# Split into train/test
set.seed(123)
idx_reg &lt;- sample(nrow(X_reg), 0.7 * nrow(X_reg))
X_reg_train &lt;- X_reg[idx_reg, ]
y_reg_train &lt;- y_reg[idx_reg]
X_reg_test &lt;- X_reg[-idx_reg, ]
y_reg_test &lt;- y_reg[-idx_reg]

# ----------------------------------------------------------------------------
# Example 1: Random Forest with specific parameters
# ----------------------------------------------------------------------------
cat(&quot;\n=== Example 1: Random Forest Regression ===\n&quot;)

mod_rf &lt;- wrap_caret(X_reg_train, y_reg_train,
                     method = &quot;rf&quot;,
                     mtry = 3)        # Number of variables sampled at each split

print(mod_rf)

# Predictions
pred_rf &lt;- predict(mod_rf, newx = X_reg_test)
rmse_rf &lt;- sqrt(mean((pred_rf - y_reg_test)^2))
r2_rf &lt;- 1 - sum((y_reg_test - pred_rf)^2) / sum((y_reg_test - mean(y_reg_test))^2)

cat(&quot;RMSE:&quot;, round(rmse_rf, 3), &quot;\n&quot;)
cat(&quot;R-squared:&quot;, round(r2_rf, 3), &quot;\n&quot;)


=== Example 1: Random Forest Regression ===
$model
Random Forest 

22 samples
10 predictors

No pre-processing
Resampling: None 

$task
[1] &quot;regression&quot;

$method
[1] &quot;rf&quot;

$parameters
$parameters$mtry
[1] 3


attr(,&quot;class&quot;)
[1] &quot;wrap_caret&quot;
RMSE: 2.007 
R-squared: 0.681 

library(ggplot2)

df &lt;- data.frame(
  pred = pred_rf,
  actual = y_reg_test
)

ggplot(df, aes(x = pred, y = actual)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = &quot;red&quot;) +
  theme_minimal() +
  labs(x = &quot;Predicted&quot;, y = &quot;Actual&quot;)
</pre>

<p><img src="https://i0.wp.com/thierrymoudiki.github.io/images/2026-04-12/2026-04-12-intro-mlS3_7_0.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/04/12/r/intro-mlS3"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/mls3-a-unified-s3-machine-learning-interface-in-r/">`mlS3` — A Unified S3 Machine Learning Interface in R</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400505</post-id>	</item>
		<item>
		<title>Test &#038; Roll: Why Smaller A/B Tests Can Make More Money</title>
		<link>https://www.r-bloggers.com/2026/04/test-roll-why-smaller-a-b-tests-can-make-more-money/</link>
		
		<dc:creator><![CDATA[Florian Teschner]]></dc:creator>
		<pubDate>Sun, 12 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://flovv.github.io/test-and-roll-profit-maximizing-ab-tests/</guid>

					<description><![CDATA[<p>Short practical advice on A/B testing:</p>
<p>    Stop sizing tests only for statistical significance - In finite campaigns, your goal is profit, not perfect inference.</p>
<p>    Treat testing as a trade-off - Every extra test exposure buys learning but ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/test-roll-why-smaller-a-b-tests-can-make-more-money/">Test & Roll: Why Smaller A/B Tests Can Make More Money</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://flovv.github.io/test-and-roll-profit-maximizing-ab-tests/"> Florian Teschner</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<h2 id="short-practical-advice-on-ab-testing">Short practical advice on A/B testing:</h2>

<ol>
  <li>
    <p><strong>Stop sizing tests only for statistical significance</strong> &#8211; In finite campaigns, your goal is profit, not perfect inference.</p>
  </li>
  <li>
    <p><strong>Treat testing as a trade-off</strong> &#8211; Every extra test exposure buys learning but also burns revenue if that exposure gets the weaker treatment.</p>
  </li>
  <li>
    <p><strong>Use smaller tests when outcomes are noisy</strong> &#8211; This paper shows profit-maximizing test sizes rise much more slowly than classical power-based sizes.</p>
  </li>
  <li>
    <p><strong>Scale test size with reachable audience</strong> &#8211; If your population is limited, test size should reflect that constraint directly.</p>
  </li>
  <li>
    <p><strong>Allow unequal splits when priors differ</strong> &#8211; If one treatment is likely better a priori (e.g., treatment vs holdout), asymmetric test cells can be optimal.</p>
  </li>
</ol>

<h2 id="shiny-app-to-test-the-implications">Shiny App to test the implications:</h2>

<p><a href="https://testandroll.shinyapps.io/testandroll/" rel="nofollow" target="_blank">Test and Roll Shiny App</a></p>

<h2 id="long-version">Long Version</h2>

<p>I just read <em>Test &#038; Roll: Profit-Maximizing A/B Tests</em> by Elea McDonnell Feit and Ron Berman (2019), and it challenges one of the default habits in marketing experimentation: planning tests as if the main objective were statistical significance.</p>

<p>Their point is simple: in most real marketing experiments, you have a finite population (email list, campaign budget, limited traffic window). In that setting, the right objective is <strong>total expected profit across test + rollout</strong>, not p-values.</p>

<h3 id="the-core-idea">The core idea</h3>

<p>A classic A/B setup has two stages:</p>

<ol>
  <li><strong>Test stage</strong>: expose <code>n1</code> users to treatment A and <code>n2</code> users to treatment B.</li>
  <li><strong>Roll stage</strong>: deploy the winner to the remaining <code>N - n1 - n2</code> users.</li>
</ol>

<p>Bigger tests improve certainty, but they also create opportunity cost: more users in test means more users potentially seeing the weaker treatment before rollout.</p>

<p>The paper formalizes this as a decision problem and derives <strong>profit-maximizing sample sizes</strong>. Under Normal priors and Normal outcomes, they get closed-form solutions.</p>

<h3 id="why-this-matters-in-practice">Why this matters in practice</h3>

<p>If you use classical hypothesis-test sizing, recommended <code>n</code> can be huge, especially when effect sizes are small and response is noisy (which is exactly what we see in advertising).</p>

<p>Their framework produces much smaller test sizes because it optimizes business outcomes, not Type I/II error control.</p>

<p>Two important takeaways:</p>

<ol>
  <li><strong>Optimal test sizes grow sub-linearly with response noise</strong>, while classical sample size rules grow much faster.</li>
  <li><strong>Optimal test sizes scale with the square root of population size <code>N</code></strong>, which makes them workable for smaller markets and finite campaigns.</li>
</ol>

<h3 id="comparison-with-bandits">Comparison with bandits</h3>

<p>The authors benchmark against Thompson sampling (multi-armed bandit). Bandits usually win on pure optimization, but the gap is often modest in their examples.</p>

<p>That is useful operationally: a two-stage “test then roll” process is far easier to implement, explain, and govern than a continuously-adapting bandit, especially in organizations with approval and reporting constraints.</p>

<h3 id="the-applications-are-the-best-part">The applications are the best part</h3>

<p>They test the approach in three contexts:</p>

<ol>
  <li>Website design experiments</li>
  <li>Display advertising decisions</li>
  <li>Catalog holdout tests</li>
</ol>

<p>Across cases, profit-maximizing designs use <strong>substantially smaller test cells</strong> than classical power calculations and produce higher expected profit.</p>

<p>A particularly practical result: small holdout groups (common in catalog and CRM practice) can be fully rational when priors are asymmetric. In other words, “unequal splits” are not always bad design; they can be the optimal design.</p>

<h3 id="what-i-changed-in-my-own-thinking">What I changed in my own thinking</h3>

<p>Before this, I treated “underpowered” mostly as a red flag. After this paper, I think a better question is:</p>

<p><strong>Underpowered for what objective?</strong></p>

<p>If the objective is publication-grade inference, classical power logic is right.
If the objective is campaign profit in a finite horizon, a smaller test can be the better business decision.</p>

<h3 id="practical-implementation-checklist">Practical implementation checklist</h3>

<p>If you run tactical tests (email, paid media, landing pages), this paper suggests a better workflow:</p>

<ol>
  <li>Define total reachable population <code>N</code> for the decision horizon.</li>
  <li>Set priors for treatment means from past similar experiments.</li>
  <li>Estimate response variance from historical data.</li>
  <li>Compute profit-maximizing <code>n1</code>, <code>n2</code>.</li>
  <li>Pre-commit the rollout decision rule (posterior expected profit winner).</li>
  <li>Report expected regret alongside expected upside.</li>
</ol>

<p>That last point is underrated: decision-makers usually understand “expected dollars at risk” better than p-values.</p>

<h3 id="bottom-line">Bottom line</h3>

<p>For many real marketing tests, “smaller than textbook” is not bad science. It is better decision design.</p>

<p>If your experiment exists to drive a business action on a finite audience, <em>Test &#038; Roll</em> gives a rigorous way to choose sample sizes that maximize profit instead of statistical purity.</p>

<hr />

<p>Paper: Feit, E. M., &#038; Berman, R. (2019). <em>Test &#038; Roll: Profit-Maximizing A/B Tests</em>. SSRN: https://ssrn.com/abstract=3274875</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://flovv.github.io/test-and-roll-profit-maximizing-ab-tests/"> Florian Teschner</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/test-roll-why-smaller-a-b-tests-can-make-more-money/">Test & Roll: Why Smaller A/B Tests Can Make More Money</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400493</post-id>	</item>
		<item>
		<title>Machine Learning Frameworks in R</title>
		<link>https://www.r-bloggers.com/2026/04/machine-learning-frameworks-in-r/</link>
		
		<dc:creator><![CDATA[R&#039;tichoke]]></dc:creator>
		<pubDate>Sat, 11 Apr 2026 18:30:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://rtichoke.netlify.app/posts/ml-frameworks-in-r.html</guid>

					<description><![CDATA[<p>R’s ecosystem offers a rich selection of machine learning frameworks, each with distinct design philosophies and strengths. This post is a side-by-side comparison of five ML frameworks in R that provide unified interfaces over multiple algorithms...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/machine-learning-frameworks-in-r/">Machine Learning Frameworks in R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://rtichoke.netlify.app/posts/ml-frameworks-in-r.html"> R&#039;tichoke</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 




<p>R’s ecosystem offers a rich selection of machine learning frameworks, each with distinct design philosophies and strengths. This post is a side-by-side comparison of five ML frameworks in R that provide <strong>unified interfaces</strong> over multiple algorithms, with runnable code examples on the same dataset so you can compare APIs directly. The focus is on packages that let you swap algorithms without rewriting your code.</p>
<section id="frameworks-at-a-glance" class="level2">
<h2 class="anchored" data-anchor-id="frameworks-at-a-glance">Frameworks at a Glance</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
</colgroup>
<thead>
<tr class="header">
<th>Feature</th>
<th>tidymodels</th>
<th>caret</th>
<th>mlr3</th>
<th>h2o</th>
<th>qeML</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Built-in tuning</strong></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (<code>tune</code>)</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (<code>mlr3tuning</code>)</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (AutoML)</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (<code>qeFT()</code>)</td>
</tr>
<tr class="even">
<td><strong>Preprocessing pipeline</strong></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (<code>recipes</code>)</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (<code>preProcess</code>)</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> (<code>mlr3pipelines</code>)</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
</tr>
<tr class="odd">
<td><strong>Model variety</strong></td>
<td>200+ engines</td>
<td>230+ methods</td>
<td>100+ learners</td>
<td>GBM, GLM, DL, DRF</td>
<td>20+ wrappers</td>
</tr>
<tr class="even">
<td><strong>Relative speed</strong></td>
<td>Moderate</td>
<td>Moderate</td>
<td>Moderate</td>
<td>Fast (distributed)</td>
<td>Depends on backend</td>
</tr>
<tr class="odd">
<td><strong>Learning curve</strong></td>
<td>Medium</td>
<td>Low</td>
<td>High</td>
<td>Low</td>
<td>Very low</td>
</tr>
<tr class="even">
<td><strong>Active development</strong></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/26a0.png" alt="⚠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Maintenance mode</td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
</tr>
<tr class="odd">
<td><strong>Best for</strong></td>
<td>Production pipelines</td>
<td>Quick prototyping</td>
<td>Benchmarking</td>
<td>AutoML &#038; scale</td>
<td>Teaching &#038; exploration</td>
</tr>
</tbody>
</table>
</section>
<section id="setup-and-data" class="level2">
<h2 class="anchored" data-anchor-id="setup-and-data">Setup and Data</h2>
<p>ALl examples below uses the <code>iris</code> classification task: predict <code>Species</code> from the four numeric measurements. A single train/test split is created up front so results are directly comparable.</p>
<div class="cell">
<pre>library(dplyr)

# Reproducible train/test split (framework-agnostic)
set.seed(42)
n &lt;- nrow(iris)
train_idx &lt;- sample(seq_len(n), size = floor(0.7 * n))

train_data &lt;- iris[train_idx, ]
test_data  &lt;- iris[-train_idx, ]

# Store accuracy results for final comparison
results &lt;- data.frame(
  Framework = character(),
  Model = character(),
  Accuracy = numeric(),
  stringsAsFactors = FALSE
)

cat(&quot;Training set:&quot;, nrow(train_data), &quot;observations\n&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>Training set: 105 observations</pre>
</div>
<pre>cat(&quot;Test set:    &quot;, nrow(test_data), &quot;observations\n&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>Test set:     45 observations</pre>
</div>
</div>
<hr>
</section>
<section id="tidymodels" class="level2">
<h2 class="anchored" data-anchor-id="tidymodels">1. tidymodels</h2>
<p>The <a href="https://www.tidymodels.org/" rel="nofollow" target="_blank">tidymodels</a> ecosystem is the modern, tidyverse native approach to modeling in R. It provides a consistent grammar for specifying models (<code>parsnip</code>), preprocessing (<code>recipes</code>), composing workflows (<code>workflows</code>), and tuning hyperparameters (<code>tune</code>).</p>
<div class="cell">
<pre>library(tidymodels)

# Define a recipe (preprocessing)
rec &lt;- recipe(Species ~ ., data = train_data)

# Define a model specification
rf_spec &lt;- rand_forest(trees = 500) %&gt;%
  set_engine(&quot;ranger&quot;) %&gt;%
  set_mode(&quot;classification&quot;)

# Combine into a workflow
rf_wf &lt;- workflow() %&gt;%
  add_recipe(rec) %&gt;%
  add_model(rf_spec)

# Fit the workflow
rf_fit &lt;- rf_wf %&gt;% fit(data = train_data)

# Predict on test set
preds_tidy &lt;- predict(rf_fit, test_data) %&gt;%
  bind_cols(test_data %&gt;% select(Species))

# Evaluate
acc_tidy &lt;- accuracy(preds_tidy, truth = Species, estimate = .pred_class)
acc_tidy</pre>
<div class="cell-output cell-output-stdout">
<pre># A tibble: 1 × 3
  .metric  .estimator .estimate
  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;
1 accuracy multiclass     0.978</pre>
</div>
<pre>results &lt;- rbind(results, data.frame(
  Framework = &quot;tidymodels&quot;,
  Model = &quot;Random Forest (ranger)&quot;,
  Accuracy = acc_tidy$.estimate
))</pre>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Key Strengths
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li>Composable pipeline: <code>recipe</code> + <code>model</code> + <code>workflow</code> is easy to extend</li>
<li>Swap engines with one line (<code>set_engine(&quot;xgboost&quot;)</code>)</li>
<li>Seamless cross-validation and hyperparameter tuning via <code>tune_grid()</code> / <code>tune_bayes()</code></li>
<li>Deep integration with the tidyverse</li>
</ul>
</div>
</div>
<hr>
</section>
<section id="caret" class="level2">
<h2 class="anchored" data-anchor-id="caret">2. caret</h2>
<p>The <a href="https://topepo.github.io/caret/" rel="nofollow" target="_blank"><code>caret</code></a> package (Classification And REgression Training) was the de facto standard for ML in R for over a decade. It wraps 230+ models behind a single <code>train()</code> interface. While now in maintenance mode (its creator, Max Kuhn, leads <code>tidymodels</code>), it still remains widely used.</p>
<div class="cell">
<pre>library(caret)

# Train a random forest with 5-fold CV
ctrl &lt;- trainControl(method = &quot;cv&quot;, number = 5)

rf_caret &lt;- train(
  Species ~ .,
  data = train_data,
  method = &quot;rf&quot;,
  trControl = ctrl,
  tuneLength = 3  # Try 3 values of mtry
)

# Best tuning parameter
rf_caret$bestTune</pre>
<div class="cell-output cell-output-stdout">
<pre>  mtry
1    2</pre>
</div>
<pre># Predict on test set
preds_caret &lt;- predict(rf_caret, test_data)

# Evaluate
cm_caret &lt;- confusionMatrix(preds_caret, test_data$Species)
cm_caret$overall[&quot;Accuracy&quot;]</pre>
<div class="cell-output cell-output-stdout">
<pre> Accuracy 
0.9555556 </pre>
</div>
<pre>results &lt;- rbind(results, data.frame(
  Framework = &quot;caret&quot;,
  Model = &quot;Random Forest (rf)&quot;,
  Accuracy = as.numeric(cm_caret$overall[&quot;Accuracy&quot;])
))</pre>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Key Strengths
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li>Minimal API: a single <code>train()</code> call handles preprocessing, tuning, and fitting</li>
<li>230+ model methods available out of the box</li>
<li>Built-in <code>confusionMatrix()</code> with extensive diagnostics</li>
<li>Massive community knowledge base and Stack Overflow coverage</li>
</ul>
</div>
</div>
<hr>
</section>
<section id="mlr3" class="level2">
<h2 class="anchored" data-anchor-id="mlr3">3. mlr3</h2>
<p><a href="https://mlr3.mlr-org.com/" rel="nofollow" target="_blank"><code>mlr3</code></a> is a modern, object-oriented ML framework built on R6 classes. It excels at systematic benchmarking, composable pipelines, and reproducible experiments. The learning curve is steeper, but the payoff is a powerful, extensible architecture.</p>
<div class="cell">
<pre>library(mlr3)
library(mlr3learners)

# Define the task
task &lt;- TaskClassif$new(
  id = &quot;iris&quot;,
  backend = train_data,
  target = &quot;Species&quot;
)

# Define the learner
learner &lt;- lrn(&quot;classif.ranger&quot;, num.trees = 500)

# Train
learner$train(task)

# Predict on test data — create a test task to avoid backend storage issues
test_task &lt;- TaskClassif$new(
  id = &quot;iris_test&quot;,
  backend = test_data,
  target = &quot;Species&quot;
)
pred_mlr3 &lt;- learner$predict(test_task)

# Evaluate
acc_mlr3 &lt;- pred_mlr3$score(msr(&quot;classif.acc&quot;))
acc_mlr3</pre>
<div class="cell-output cell-output-stdout">
<pre>classif.acc 
  0.9555556 </pre>
</div>
<pre>results &lt;- rbind(results, data.frame(
  Framework = &quot;mlr3&quot;,
  Model = &quot;Random Forest (ranger)&quot;,
  Accuracy = as.numeric(acc_mlr3)
))</pre>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Key Strengths
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li>R6 object-oriented design — everything is an object with methods</li>
<li>First-class benchmarking: compare multiple learners on multiple tasks with <code>benchmark()</code></li>
<li>Composable pipelines via <code>mlr3pipelines</code> (stacking, ensembling, feature engineering)</li>
<li>Built-in resampling strategies and performance measures</li>
</ul>
</div>
</div>
<hr>
</section>
<section id="h2o-automl" class="level2">
<h2 class="anchored" data-anchor-id="h2o-automl">4. h2o (AutoML)</h2>
<p><a href="https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/index.html" rel="nofollow" target="_blank"><code>h2o</code></a> is a distributed machine learning platform with a powerful R interface. Its standout feature is <code>h2o.automl()</code> automatic model selection, hyperparameter tuning, and stacked ensemble creation with a single function call. It runs on a local JVM, so Java must be installed.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This section requires Java (JDK 8+) to be installed. <code>h2o</code> starts a local JVM-based server. If you don’t have Java, skip to the results comparison — the other four frameworks cover the same ground without this dependency.</p>
</div>
</div>
<div class="cell">
<pre>library(h2o)

# Start a local h2o cluster (uses available cores)
h2o.init(nthreads = -1, max_mem_size = &quot;2G&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>
H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    C:\Users\RIDDHI~1\AppData\Local\Temp\Rtmp8G4u9C\filec0caad7dcc/h2o_Riddhiman_Roy_started_from_r.out
    C:\Users\RIDDHI~1\AppData\Local\Temp\Rtmp8G4u9C\filec0c1660783f/h2o_Riddhiman_Roy_started_from_r.err


Starting H2O JVM and connecting:  Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 276 milliseconds 
    H2O cluster timezone:       Asia/Kolkata 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.44.0.3 
    H2O cluster version age:    2 years, 3 months and 23 days 
    H2O cluster name:           H2O_started_from_R_Riddhiman_Roy_axb153 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.98 GB 
    H2O cluster total cores:    24 
    H2O cluster allowed cores:  24 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 4.5.3 (2026-03-11 ucrt) </pre>
</div>
<pre>h2o.no_progress()  # Suppress progress bars

# Convert data to h2o frames
train_h2o &lt;- as.h2o(train_data)
test_h2o  &lt;- as.h2o(test_data)

# Run AutoML — automatic model selection and stacking
aml &lt;- h2o.automl(
  x = c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;, &quot;Petal.Length&quot;, &quot;Petal.Width&quot;),
  y = &quot;Species&quot;,
  training_frame = train_h2o,
  max_models = 10,
  seed = 42
)</pre>
<div class="cell-output cell-output-stdout">
<pre>
22:11:38.3: AutoML: XGBoost is not available; skipping it.
22:11:39.171: _min_rows param, The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 105.0.</pre>
</div>
<pre># Leaderboard — best models ranked by cross-validated performance
h2o.get_leaderboard(aml) |&gt; as.data.frame() |&gt; head(5)</pre>
<div class="cell-output cell-output-stdout">
<pre>                                                 model_id mean_per_class_error
1    DeepLearning_grid_1_AutoML_1_20260412_221137_model_1           0.03988095
2                          GBM_2_AutoML_1_20260412_221137           0.05029762
3                          GLM_1_AutoML_1_20260412_221137           0.05029762
4    StackedEnsemble_AllModels_1_AutoML_1_20260412_221137           0.05982143
5 StackedEnsemble_BestOfFamily_1_AutoML_1_20260412_221137           0.05982143
     logloss      rmse        mse
1 0.10262590 0.1802887 0.03250400
2 0.13688347 0.1981121 0.03924839
3 0.09073184 0.1736624 0.03015862
4 0.12933104 0.2002635 0.04010548
5 0.11828660 0.1921656 0.03692762</pre>
</div>
<pre># Predict with the best model
preds_h2o &lt;- h2o.predict(aml@leader, test_h2o)
acc_h2o &lt;- mean(as.vector(preds_h2o$predict) == as.vector(test_h2o$Species))
cat(&quot;Accuracy:&quot;, acc_h2o, &quot;\n&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>Accuracy: 0.9777778 </pre>
</div>
<pre>results &lt;- rbind(results, data.frame(
  Framework = &quot;h2o&quot;,
  Model = paste0(&quot;AutoML (&quot;, aml@leader@algorithm, &quot;)&quot;),
  Accuracy = acc_h2o
))

# Shutdown h2o
h2o.shutdown(prompt = FALSE)</pre>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Key Strengths
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li><code>h2o.automl()</code> — fully automatic model selection, tuning, and stacked ensembles</li>
<li>Trains GBM, XGBoost, GLM, DRF, and deep learning models in one call</li>
<li>Distributed computing — scales to datasets larger than memory</li>
<li>Built-in leaderboard for model comparison</li>
<li>Production deployment via MOJO/POJO model export</li>
</ul>
</div>
</div>
<hr>
</section>
<section id="qeml" class="level2">
<h2 class="anchored" data-anchor-id="qeml">5. qeML</h2>
<p><a href="https://cran.r-project.org/package=qeML" rel="nofollow" target="_blank"><code>qeML</code></a> (Quick and Easy Machine Learning) takes a different approach of minimizing boilerplate. Every algorithm — random forest, gradient boosting, SVM, KNN, LASSO, neural nets, and more is wrapped behind a one-line <code>qe*()</code> function with a consistent <code>(data, targetName)</code> signature. No formula objects, no matrix conversions, no separate predict calls just results. It’s ideal for teaching, exploration, and quick model comparisons.</p>
<div class="cell">
<pre>library(qeML)

# qeML convention: pass full data + target name (string)
# It handles train/test splitting internally via holdout
# But to match our split, we'll train on train_data and predict on test_data
# predict() expects new data WITHOUT the target column
test_features &lt;- test_data[, -which(names(test_data) == &quot;Species&quot;)]

# Random Forest (wraps randomForest)
rf_qe &lt;- qeRF(train_data, &quot;Species&quot;)
preds_rf_qe &lt;- predict(rf_qe, test_features)
acc_rf_qe &lt;- mean(preds_rf_qe$predClasses == test_data$Species)
cat(&quot;Random Forest accuracy:&quot;, acc_rf_qe, &quot;\n&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>Random Forest accuracy: 0.9777778 </pre>
</div>
<pre># Gradient Boosting (wraps gbm)
gb_qe &lt;- qeGBoost(train_data, &quot;Species&quot;)
preds_gb_qe &lt;- predict(gb_qe, test_features)
acc_gb_qe &lt;- mean(preds_gb_qe$predClasses == test_data$Species)
cat(&quot;Gradient Boosting accuracy:&quot;, acc_gb_qe, &quot;\n&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>Gradient Boosting accuracy: 0.9555556 </pre>
</div>
<pre># SVM (wraps e1071)
svm_qe &lt;- qeSVM(train_data, &quot;Species&quot;)
preds_svm_qe &lt;- predict(svm_qe, test_features)
acc_svm_qe &lt;- mean(preds_svm_qe$predClasses == test_data$Species)
cat(&quot;SVM accuracy:&quot;, acc_svm_qe, &quot;\n&quot;)</pre>
<div class="cell-output cell-output-stdout">
<pre>SVM accuracy: 0.9555556 </pre>
</div>
<pre># Use the best-performing qeML model for the results table
best_acc_qe &lt;- max(acc_rf_qe, acc_gb_qe, acc_svm_qe)
best_model_qe &lt;- c(&quot;Random Forest&quot;, &quot;Gradient Boosting&quot;, &quot;SVM&quot;)[
  which.max(c(acc_rf_qe, acc_gb_qe, acc_svm_qe))
]

results &lt;- rbind(results, data.frame(
  Framework = &quot;qeML&quot;,
  Model = paste0(best_model_qe, &quot; (qe wrapper)&quot;),
  Accuracy = best_acc_qe
))</pre>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Key Strengths
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li>One-line model fitting: <code>qeRF(data, &quot;target&quot;)</code> — no formula, no matrix, no recipe</li>
<li>20+ algorithms behind a uniform <code>qe*()</code> interface (RF, GBM, SVM, KNN, LASSO, neural nets, and more)</li>
<li><code>qeCompare()</code> lets you benchmark multiple methods in a single call</li>
<li>Built-in holdout evaluation</li>
<li>Lowest learning curve of any framework showcased here</li>
</ul>
</div>
</div>
<hr>
</section>
<section id="results-comparison" class="level2">
<h2 class="anchored" data-anchor-id="results-comparison">Results Comparison</h2>
<p>All five frameworks were trained on the same 70/30 split of the <code>iris</code> dataset. Here’s how they stack up:</p>
<div class="cell">
<pre>library(knitr)

results &lt;- results %&gt;% arrange(desc(Accuracy))
kable(results, digits = 4, caption = &quot;Test Set Accuracy by Framework&quot;)</pre>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<caption>Test Set Accuracy by Framework</caption>
<thead>
<tr class="header">
<th style="text-align: left;">Framework</th>
<th style="text-align: left;">Model</th>
<th style="text-align: right;">Accuracy</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">tidymodels</td>
<td style="text-align: left;">Random Forest (ranger)</td>
<td style="text-align: right;">0.9778</td>
</tr>
<tr class="even">
<td style="text-align: left;">h2o</td>
<td style="text-align: left;">AutoML (deeplearning)</td>
<td style="text-align: right;">0.9778</td>
</tr>
<tr class="odd">
<td style="text-align: left;">qeML</td>
<td style="text-align: left;">Random Forest (qe wrapper)</td>
<td style="text-align: right;">0.9778</td>
</tr>
<tr class="even">
<td style="text-align: left;">caret</td>
<td style="text-align: left;">Random Forest (rf)</td>
<td style="text-align: right;">0.9556</td>
</tr>
<tr class="odd">
<td style="text-align: left;">mlr3</td>
<td style="text-align: left;">Random Forest (ranger)</td>
<td style="text-align: right;">0.9556</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>On a clean, small dataset like <code>iris</code>, accuracy differences are minimal. The real differentiator is the API and workflow each framework provides. On real-world datasets the choice of framework matters more for how you structure your code than for raw accuracy.</p>
</section>
<section id="closing-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="closing-thoughts">Closing thoughts</h2>
<p>There is no single “best” ML framework in R and the right choice depends on the task at hand:</p>
<ul>
<li><strong>Start with <code>tidymodels</code></strong> for a modern, composable, production-ready pipeline.</li>
<li><strong>Try <code>qeML</code></strong> for the fastest path from data to results.</li>
<li><strong>Use <code>h2o</code></strong> for automatic model selection and stacking with minimal effort.</li>
<li><strong>Consider <code>mlr3</code></strong> for rigorous benchmarking and advanced pipeline composition.</li>
<li><strong>Stick with <code>caret</code></strong> if maintaining existing code or prefer its battle-tested simplicity.</li>
</ul>


</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rtichoke.netlify.app/posts/ml-frameworks-in-r.html"> R&#039;tichoke</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/machine-learning-frameworks-in-r/">Machine Learning Frameworks in R</a>]]></content:encoded>
					
		
		<enclosure url="https://rtichoke.netlify.app/images/ml-frameworks.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">400507</post-id>	</item>
		<item>
		<title>New R Package {bdlnm} Released on CRAN: Bayesian Distributed Lag Non-Linear Models in R via INLA</title>
		<link>https://www.r-bloggers.com/2026/04/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/</link>
		
		<dc:creator><![CDATA[Pau Satorra]]></dc:creator>
		<pubDate>Fri, 10 Apr 2026 17:14:30 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=18982</guid>

					<description><![CDATA[<p>CRAN, GitHub TL;DR: {bdlnm} brings Bayesian Distributed Lag Non-Linear Models (B-DLNMs) to R using INLA, allowing to model complex DLNMs, quantify uncertainty, and produce rich visualizations. Background Climate change is increasing exposure to extreme environmental conditions such as heatwaves and air pollution. However, these exposures rarely have immediate effects. ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/">New R Package {bdlnm} Released on CRAN: Bayesian Distributed Lag Non-Linear Models in R via INLA</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><a href="https://cran.r-project.org/package=bdlnm" rel="nofollow" target="_blank">CRAN</a>, <a href="https://github.com/pasahe/bdlnm" rel="nofollow" target="_blank">GitHub</a></p>
<blockquote class="blockquote"><strong>TL;DR</strong>: {bdlnm} brings Bayesian Distributed Lag Non-Linear Models (B-DLNMs) to R using INLA, allowing to model complex DLNMs, quantify uncertainty, and produce rich visualizations.</blockquote>
<section id="background" class="level1">
<h1>Background</h1>
<p>Climate change is increasing exposure to extreme environmental conditions such as heatwaves and air pollution. However, these exposures rarely have immediate effects. For example:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>A heatwave today may increase mortality several days later</li>
	<li>Air pollution can have cumulative and delayed impacts</li>
</ul>
</li>
</ul>
<p>Distributed Lag Non-Linear Models (DLNMs) are the standard framework for studying these effects. They simultaneously model:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>How risk changes with exposure level (exposure-response)</li>
	<li>How risk evolve over time (lag-response)</li>
</ul>
</li>
</ul>
<p>Usually in the presence of non-linear effects, splines are used to define these two relationships. These two basis are then combined through a cross-basis function. </p>
<p>As datasets become larger and more complex (e.g., studies with different regions and longer time periods), classical approaches show limitations. Bayesian DLNMs extend this framework by:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>Supporting more flexible model structures</li>
	<li>Providing full posterior distributions</li>
	<li>Enabling richer uncertainty quantification</li>
</ul>
</li>
</ul>
<p><span>The new {bdlnm} package extends the framework of the {dlnm} package to a Bayesian setting, using Integrated Nested</span> Laplace Approximation (INLA), a fast alternative to MCMC for Bayesian inference.</p>
</section>
<section id="installing-and-loading-the-package" class="level1">
<h1>Installing and loading the package</h1>
<p>As of March 2026, the package is available on CRAN:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb1">
<pre>install.packages(&quot;bdlnm&quot;)
library(bdlnm)</pre>
</div>
</div>
</div>
<p>At least the stable version of INLA 23.4.24 (or newest) must be installed beforehand. You can install the newest stable INLA version by:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb2">
<pre>install.packages(
  &quot;INLA&quot;,
  repos = c(
    getOption(&quot;repos&quot;),
    INLA = &quot;https://inla.r-inla-download.org/R/stable&quot;
  ),
  dep = TRUE
)</pre>
</div>
</div>
</div>
<p>Now, let’s load all the libraries we will need for this short tutorial:</p>
<div class="cell">
<details class="code-fold">
<summary>Load required libraries</summary>
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb3">
<pre># DLNMs and splines
library(dlnm)
library(splines)

# Data manipulation
library(dplyr)
library(reshape2)
library(stringr)
library(lubridate)

# Visualization
library(ggplot2)
library(gganimate)
library(ggnewscale)
library(patchwork)
library(scales)
library(plotly)

# Tables
library(gt)

# Execution time
library(tictoc)</pre>
</div>
</div>
</details>
</div>
</section>
<section id="hands-on-example" class="level1">
<h1>Hands-on example</h1>
<p>We use the built-in london dataset with daily temperature and mortality (age 75+) from 2000-2012.</p>
<p>Before fitting any model, it is useful to explore the raw data. This plot shows daily mean temperature and mortality for the 75+ age group in London from 2000 to 2012, providing a first look at the time series we are trying to model:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb4">
<pre>col_mort &lt;- &quot;#2f2f2f&quot;
col_temp &lt;- &quot;#8e44ad&quot;

# Scaling parameters
a &lt;- (max(london$mort_75plus) - min(london$mort_75plus)) /
  (max(london$tmean) - min(london$tmean))
b &lt;- min(london$mort_75plus) - min(london$tmean) * a

p &lt;- ggplot(london, aes(x = yday(date))) +
  geom_line(
    aes(y = a * tmean + b, color = &quot;Mean Temperature&quot;),
    linewidth = 0.4
  ) +
  geom_line(
    aes(y = mort_75plus, color = &quot;Daily Mortality (+75 years)&quot;),
    linewidth = 0.4
  ) +
  facet_wrap(~year, ncol = 3) +
  scale_y_continuous(
    name = &quot;Daily Mortality (+75 years)&quot;,
    breaks = seq(0, 225, by = 50),
    sec.axis = sec_axis(
      name = &quot;Mean Temperature (°C)&quot;,
      transform = ~ (. - b) / a,
      breaks = seq(-10, 30, by = 10)
    )
  ) +
  scale_x_continuous(
    breaks = yday(as.Date(paste0(
      &quot;2000-&quot;,
      c(&quot;01&quot;, &quot;03&quot;, &quot;05&quot;, &quot;07&quot;, &quot;09&quot;, &quot;11&quot;),
      &quot;-01&quot;
    ))),
    labels = c(&quot;Jan&quot;, &quot;Mar&quot;, &quot;May&quot;, &quot;Jul&quot;, &quot;Sep&quot;, &quot;Nov&quot;),
    expand = c(0.01, 0)
  ) +
  scale_color_manual(
    values = c(
      &quot;Daily Mortality (+75 years)&quot; = col_mort,
      &quot;Mean Temperature&quot; = col_temp
    )
  ) +
  labs(x = NULL, color = NULL) +
  guides(color = &quot;none&quot;) +
  theme_minimal() +
  theme(
    axis.title.y.left = element_text(
      color = col_mort,
      face = &quot;bold&quot;,
      margin = margin(r = 8)
    ),
    axis.title.y.right = element_text(
      color = col_temp,
      face = &quot;bold&quot;,
      margin = margin(l = 8)
    ),
    axis.text.y.left = element_text(color = col_mort),
    axis.text.y.right = element_text(color = col_temp)
  ) +
  transition_reveal(as.numeric(date))

animate(p, nframes = 300, fps = 10, end_pause = 100)</pre>
</div>
</div>
<div class="cell-output-display">
<figure class="figure">
<p><img loading="lazy" fetchpriority="high" decoding="async" src="https://i0.wp.com/r-posts.com/wp-content/uploads/2026/03/tsevolution.gif?w=450" alt="" class="alignnone size-full wp-image-19043" data-recalc-dims="1" /><br />
<br />
<span style="font-size: 35px;font-weight: bold">Model overview</span></p>
</figure>
</div>
</div>
</section>
<section id="model-overview" class="level1">
<p>Conceptually, DLNMs model:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>
<p>Exposure-response: how risk changes with exposure level</p>
</li>
	<li>
<p>Lag-response: how risk unfold over time</p>
</li>
</ul>
</li>
</ul>
<p>A typical model is:</p>
<div>
<math display="block"> <msub><mi>Y</mi><mi>t</mi></msub> <mo>∼</mo> <mtext>Poisson</mtext> <mo stretchy="false">(</mo> <msub><mi>μ</mi><mi>t</mi></msub> <mo stretchy="false">)</mo> </math>
</div>
<div>
<math display="block"> <mi>log</mi> <mo stretchy="false">(</mo> <msub><mi>μ</mi><mi>t</mi></msub> <mo stretchy="false">)</mo> <mo>=</mo> <mi>α</mi> <mo>+</mo> <mi>cb</mi> <mo stretchy="false">(</mo> <msub><mi>x</mi><mi>t</mi></msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>x</mi> <mrow><mi>t</mi><mo>–</mo><mi>L</mi></mrow> </msub> <mo stretchy="false">)</mo> <mo>·</mo> <mi>β</mi> <mo>+</mo> <munder> <mo>∑</mo> <mrow><mi>k</mi></mrow> </munder> <mi>
</mi> <msub><mi>γ</mi><mi>k</mi></msub> <msub> <mi>u</mi> <mrow><mi>k</mi><mi>t</mi></mrow> </msub> </math>
</div>
<p>where:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li><i>α</i> is the intercept</li>
	<li><i>cb</i>(·) is the cross-basis function, defining both the exposure-response and lag-response relationships</li>
	<li><i>β</i> are the coefficients associated with the cross-basis terms</li>
	<li><i>u</i><sub>kt</sub> are time-varying covariates with corresponding coefficients <i>γ</i><sub>k</sub></li>
</ul>
</li>
</ul>
</section>
<section id="model-specification-setup" class="level2">
<h2 class="anchored" data-anchor-id="model-specification-setup">Model specification & setup</h2>
<p>Before fitting the model, we have to define the spline-based exposure-response and lag-response functions using the {dlnm} package.</p>
<p>For our example, we will use common specifications in the literature in temperature-mortality studies:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>
<p>Exposure-response: natural spline with three knots placed at the 10th, 75th, and 90th percentiles of daily mean temperature</p>
</li>
	<li>
<p>Lag-response: natural spline with three knots equally spaced on the log scale up to a maximum lag of 21 days</p>
</li>
</ul>
</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb5">
<pre># Exposure-response and lag-response spline parameters
dlnm_var &lt;- list(
  var_prc = c(10, 75, 90),
  var_fun = &quot;ns&quot;,
  lag_fun = &quot;ns&quot;,
  max_lag = 21,
  lagnk = 3
)

# Cross-basis parameters
argvar &lt;- list(
  fun = dlnm_var$var_fun,
  knots = quantile(london$tmean, dlnm_var$var_prc / 100, na.rm = TRUE),
  Bound = range(london$tmean, na.rm = TRUE)
)

arglag &lt;- list(
  fun = dlnm_var$lag_fun,
  knots = logknots(dlnm_var$max_lag, nk = dlnm_var$lagnk)
)

# Create crossbasis
cb &lt;- crossbasis(london$tmean, lag = dlnm_var$max_lag, argvar, arglag)</pre>
</div>
</div>
</div>
<p>As it’s commonly done in these scenarios, we will also control for the seasonality of the mortality time series using a natural spline with 8 degrees of freedom per year, which flexibly controls for long-term and seasonal trends in mortality:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb6">
<pre>seas &lt;- ns(london$date, df = round(8 * length(london$date) / 365.25))</pre>
</div>
</div>
</div>
<p>Finally, we also have to define the temperature values for which predictions will be generated:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb7">
<pre>temp &lt;- round(seq(min(london$tmean), max(london$tmean), by = 0.1), 1)</pre>
</div>
</div>
</div>
</section>
<section id="fit-the-model" class="level2">
<h2 class="anchored" data-anchor-id="fit-the-model">Fit the model</h2>
<p>Fit the previously defined Bayesian DLNM using the function <code>bdlnm()</code>. We draw 1000 samples from the posterior distribution and set a seed for reproducibility:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb8">
<pre>tictoc::tic()
mod &lt;- bdlnm(
  mort_75plus ~ cb + factor(dow) + seas,
  data = london,
  family = &quot;poisson&quot;,
  sample.arg = list(n = 1000, seed = 5243)
)
tictoc::toc()</pre>
<div class="cell-output cell-output-stdout">
<pre>8.33 sec elapsed</pre>
</div>
</div>
</div>
</div>
<p>Internally, <code>bdlnm()</code>:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>
<p>fits the model using INLA</p>
</li>
	<li>
<p>returns posterior samples for all parameters</p>
</li>
</ul>
</li>
</ul>
</section>
<section id="minimum-mortality-temperature" class="level2">
<h2 class="anchored" data-anchor-id="minimum-mortality-temperature">Minimum mortality temperature</h2>
<p>We estimate the minimum mortality temperature (MMT), defined as the temperature at which the overall mortality risk is minimized. This optimal value will later be used to center the estimated relative risks.</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb9">
<pre>tictoc::tic()
mmt &lt;- optimal_exposure(mod, exp_at = temp)
tictoc::toc()</pre>
</div>
</div>
<div class="cell-output cell-output-stdout">
<pre>7.3 sec elapsed</pre>
</div>
</div>
<p>The Bayesian framework, compared to the frequentist perspective, provides directly the full posterior distribution of the MMT. It is useful to inspect this distribution to assess whether multiple candidate optimal exposure values exist and to verify that the median provides a reasonable centering value:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb11">
<pre>ggplot(as.data.frame(mmt$est), aes(x = mmt$est)) +
  geom_histogram(
    fill = &quot;#A8C5DA&quot;,
    bins = length(unique(mmt$est)),
    alpha = 0.6,
    color = &quot;white&quot;
  ) +
  geom_density(
    aes(y = after_stat(density) * length(mmt$est) / length(unique(mmt$est))),
    color = &quot;#2E5E7E&quot;,
    linewidth = 1.2,
    adjust = 2 # &lt;-- key change: higher = smoother
  ) +
  geom_vline(
    xintercept = mmt$summary[&quot;0.5quant&quot;],
    color = &quot;#2E5E7E&quot;,
    linewidth = 1.1,
    linetype = &quot;dashed&quot;
  ) +
  scale_x_continuous(breaks = seq(min(mmt$est), max(mmt$est), by = 0.1)) +
  labs(x = &quot;Temperature (°C)&quot;, y = &quot;Frequency&quot;) +
  theme_minimal()</pre>
</div>
</div>
<div class="cell-output-display">
<figure class="figure">
<p><img loading="lazy" decoding="async" src="https://i0.wp.com/r-posts.com/wp-content/uploads/2026/04/mmt_plot.png?w=450" alt="" class="alignnone size-full wp-image-19044" srcset_temp="https://i0.wp.com/r-posts.com/wp-content/uploads/2026/04/mmt_plot.png?w=450 1344w, http://r-posts.com/wp-content/uploads/2026/04/mmt_plot-420x300.png 420w, http://r-posts.com/wp-content/uploads/2026/04/mmt_plot-450x321.png 450w, http://r-posts.com/wp-content/uploads/2026/04/mmt_plot-768x549.png 768w" sizes="(max-width: 1344px) 100vw, 1344px" data-recalc-dims="1" /></p>
</figure>
</div>
</div>
<p>The posterior distribution of the MMT is concentrated around 18.9ºC and is unimodal, so the median is a stable centering value for the relative risk estimates.</p>
<p>The posterior distribution of the MMT can also be visualized directly using the package’s <code>plot()</code> method: <code>plot(mmt)</code>.</p>
</section>
<section id="predict-exposure-lag-response-effects" class="level2">
<h2 class="anchored" data-anchor-id="predict-exposure-lag-response-effects">Predict exposure-lag-response effects</h2>
<p>We predict the exposure-lag-response association between temperature and mortality from the fitted model at the supplied temperature grid:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb12">
<pre>cen &lt;- mmt$summary[[&quot;0.5quant&quot;]]
tictoc::tic()
cpred &lt;- bcrosspred(mod, exp_at = temp, cen = cen)
tictoc::toc()</pre>
</div>
</div>
<div class="cell-output cell-output-stdout">
<pre>6.83 sec elapsed</pre>
</div>
</div>
<blockquote class="blockquote">
<p>Centering at the MMT means that relative risks (RR) are interpeted relative to this optimal temperature with minimum mortality.</p>
</blockquote>
<p>Several visualizations can be produced from these predictions. While simpler visualizations can be created using the package’s <code>plot()</code> method, here we will use fancier <code>ggplot2</code> visualizations:<br />
<br />
<span style="font-size: 29px;font-weight: bold">3D exposure-lag-response surface</span></p>
</section>
<section id="d-exposure-lag-response-surface" class="level2">
<p>We can plot the full exposure-lag-response association using a 3-D surface:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb14">
<pre>matRRfit_median &lt;- cpred$matRRfit.summary[,, &quot;0.5quant&quot;]
x &lt;- rownames(matRRfit_median)
y &lt;- colnames(matRRfit_median)
z &lt;- t(matRRfit_median)

zmin &lt;- min(z, na.rm = TRUE)
zmax &lt;- max(z, na.rm = TRUE)
mid &lt;- (1 - zmin) / (zmax - zmin)

plot_ly() |&gt;
  add_surface(
    x = x,
    y = y,
    z = z,
    surfacecolor = z,
    cmin = zmin,
    cmax = zmax,
    colorscale = list(
      c(0, &quot;#00696e&quot;),
      c(mid * 0.5, &quot;#80c8c8&quot;),
      c(mid, &quot;#f5f0e8&quot;),
      c(mid + (1 - mid) * 0.5, &quot;#c2714f&quot;),
      c(1, &quot;#6b1c1c&quot;)
    ),
    colorbar = list(title = &quot;RR&quot;)
  ) |&gt;
  add_surface(
    x = x,
    y = y,
    z = matrix(1, nrow = length(y), ncol = length(x)),
    colorscale = list(c(0, &quot;black&quot;), c(1, &quot;black&quot;)),
    opacity = 0.4,
    showscale = FALSE
  ) |&gt;
  layout(
    title = &quot;Exposure-Lag-Response Surface&quot;,
    scene = list(
      xaxis = list(title = &quot;Temperature (°C)&quot;),
      yaxis = list(title = &quot;Lag&quot;, tickvals = y, ticktext = gsub(&quot;lag&quot;, &quot;&quot;, y)),
      zaxis = list(title = &quot;RR&quot;),
      camera = list(eye = list(x = 1.5, y = -1.8, z = 0.8))
    )
  )</pre>
</div>
</div>
<figure style="text-align: center"><a href="https://pasahe.github.io/3dplot_bdlnm/" rel="nofollow" target="_blank"> <img loading="lazy" decoding="async" src="https://i2.wp.com/r-posts.com/wp-content/uploads/2026/04/3dplot_bdlnm.png?w=450" alt="" class="alignnone size-full wp-image-19049" data-recalc-dims="1" /> </a>
<figcaption><span style="font-size: 10pt">Click the image to explore the interactive Plotly version</span></figcaption>
</figure>
<div class="cell-output-display"><br />
The surface reveals two distinct risk regions. Hot temperatures produce a sharp, acute risk concentrated at the first lags, peaking at lag 0 and dissipating rapidly after the first lags. Cold temperatures produce a more modest and gradual increase in the first lags that does not fully disappear at longer lags. Intermediate temperatures near the MMT sit close to the RR = 1 reference plane across all lags.<br />
<br />
</div>
</div>
<p>The differential lag structure observed for heat- and cold-related mortality is consistent with known physiological mechanisms. Heat-related mortality tends to occur rapidly after exposure due to acute physiological stress, whereas cold-related mortality develops more gradually through delayed cardiovascular and respiratory effects, leading to increasing risk over longer lag periods.</p>
</section>
<section id="lag-response-curves" class="level2">
<h2 class="anchored" data-anchor-id="lag-response-curves">Lag-response curves</h2>
<p>We can also visualizes slices of the previous surface. For example, the lag-response relationship for different temperature values:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb15">
<pre>matRRfit &lt;- cbind(
  melt(cpred$matRRfit.summary[,, &quot;0.5quant&quot;], value.name = &quot;RR&quot;),
  RR_lci = melt(
    cpred$matRRfit.summary[,, &quot;0.025quant&quot;],
    value.name = &quot;RR_lci&quot;
  )$RR_lci,
  RR_uci = melt(
    cpred$matRRfit.summary[,, &quot;0.975quant&quot;],
    value.name = &quot;RR_uci&quot;
  )$RR_uci
) |&gt;
  rename(temperature = Var1, lag = Var2) |&gt;
  mutate(
    lag = as.numeric(gsub(&quot;lag&quot;, &quot;&quot;, lag))
  )

temps &lt;- cpred$exp_at

p &lt;- ggplot() +
  # Lag-responses curves colored by temperature
  geom_line(
    data = matRRfit,
    aes(x = lag, y = RR, group = temperature, color = temperature),
    alpha = 0.35,
    linewidth = 0.35
  ) +
  scale_color_gradientn(
    colours = c(
      &quot;#2166ac&quot;,
      &quot;#4393c3&quot;,
      &quot;#92c5de&quot;,
      &quot;#d1e5f0&quot;,
      &quot;#f7f7f7&quot;,
      &quot;#fddbc7&quot;,
      &quot;#f4a582&quot;,
      &quot;#d6604d&quot;,
      &quot;#b2182b&quot;
    ),
    name = &quot;Temperature&quot;
  ) +
  # Start a new color scale for highlighted curves
  ggnewscale::new_scale_color() +
  # RR = 1 reference
  geom_hline(
    yintercept = 1,
    linetype = &quot;dashed&quot;,
    color = &quot;grey30&quot;,
    linewidth = 0.5
  ) +
  scale_x_continuous(breaks = cpred$lag_at) +
  scale_y_continuous(trans = &quot;log10&quot;, breaks = pretty_breaks(6)) +
  labs(
    title = &quot;Lag-response curves by temperature&quot;,
    x = &quot;Lag (days)&quot;,
    y = &quot;Relative Risk (RR)&quot;
  ) +
  theme_minimal() +
  theme(legend.position = &quot;top&quot;, panel.grid.minor.x = element_blank()) +
  transition_states(
    temperature,
    transition_length = 1,
    state_length = 0
  ) +
  shadow_mark(past = TRUE, future = FALSE, alpha = 0.6)

animate(p, nframes = 300, fps = 15, end_pause = 100)</pre>
</div>
</div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p>Cold temperatures (blue) gradually increase in the initial lags and then decline gradually without fully disappearing in the longer lags. Hot temperatures (red) show a different pattern: a higher risk immediately after lag 0, which drops rapidly and largely dissipates after the first lags:<br />
<br />
<img loading="lazy" decoding="async" src="https://i2.wp.com/r-posts.com/wp-content/uploads/2026/04/lag_response.gif?w=450" alt="" class="alignnone size-full wp-image-19045" data-recalc-dims="1" /><br />
<br />
<span style="font-size: 29px;font-weight: bold">Exposure-responses curves</span></p>
</figure>
</div>
</div>
</div>
</section>
<section id="exposure-responses-curves" class="level2">
<p>We can also plot the exposure-responses curves by lag and the overall cumulative curve across all the lag period:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb16">
<pre>allRRfit &lt;- data.frame(
  temperature = as.numeric(rownames(cpred$allRRfit.summary)),
  lag = &quot;overall&quot;,
  RR = cpred$allRRfit.summary[, &quot;0.5quant&quot;],
  RR_lci = cpred$allRRfit.summary[, &quot;0.025quant&quot;],
  RR_uci = cpred$allRRfit.summary[, &quot;0.975quant&quot;]
)

RRfit &lt;- rbind(matRRfit, allRRfit)

# Split data
RRfit_lags &lt;- RRfit |&gt;
  filter(!lag %in% c(&quot;overall&quot;)) |&gt;
  mutate(lag = as.numeric(lag))
RRfit_overall &lt;- RRfit |&gt;
  filter(lag %in% c(&quot;overall&quot;))

temps &lt;- cpred$exp_at
t_cold &lt;- temps[which.min(abs(temps - quantile(temps, 0.01)))]
t_hot &lt;- temps[which.min(abs(temps - quantile(temps, 0.99)))]

# Top plot: exposure-response curves for each lag and overall
p_main &lt;- ggplot() +
  # Background: all lags, fading from vivid (small) to pale (large)
  geom_line(
    data = RRfit_lags,
    aes(x = temperature, y = RR, group = lag, color = lag),
    linewidth = 0.8
  ) +
  scale_color_gradientn(
    colours = c(
      &quot;black&quot;,
      &quot;#2b1d2f&quot;,
      &quot;#4a2f5e&quot;,
      &quot;#6a4c93&quot;,
      &quot;#8b6bb8&quot;,
      &quot;#b39cdb&quot;,
      &quot;#d8c9f1&quot;,
      &quot;#f3eef5&quot;
    ),
    values = scales::rescale(c(0, 0.5, 1, 2, 3, 4, 5, 10, 20))
  ) +
  new_scale_color() +
  new_scale_fill() +
  # Credible intervals
  geom_ribbon(
    data = RRfit_overall,
    aes(
      x = temperature,
      ymin = RR_lci,
      ymax = RR_uci,
      fill = &quot;1&quot;
    ),
    alpha = 0.2
  ) +
  # Highlighted curves
  geom_line(
    data = RRfit_overall,
    aes(x = temperature, y = RR, color = &quot;1&quot;),
    linewidth = 1.2
  ) +
  geom_hline(
    yintercept = 1,
    linetype = &quot;dashed&quot;
  ) +
  scale_color_manual(values = &quot;#a6761d&quot;, labels = &quot;Overall (CrI95%)&quot;) +
  scale_fill_manual(values = &quot;#a6761d&quot;, labels = &quot;Overall (CrI95%)&quot;) +
  scale_y_continuous(
    transform = &quot;log10&quot;,
    breaks = sort(c(0.8, pretty_breaks(5)(c(0.8, 4))))
  ) +
  labs(
    x = NULL,
    y = &quot;Relative Risk (RR)&quot;,
    color = NULL,
    fill = NULL
  ) +
  theme_minimal() +
  theme(
    legend.position = &quot;top&quot;,
    axis.text.x = element_blank(),
    plot.margin = margin(8, 8, 0, 8)
  )

# Bottom plot: histogram with percentile lines
p_hist &lt;- ggplot(london, aes(x = tmean)) +
  geom_histogram(
    aes(y = after_stat(density), fill = after_stat(x)),
    binwidth = 0.5,
    color = &quot;black&quot;,
    linewidth = 0.2
  ) +
  geom_vline(
    xintercept = t_cold,
    linetype = &quot;dashed&quot;,
    color = &quot;#053061&quot;,
    linewidth = 0.6
  ) +
  geom_vline(
    xintercept = t_hot,
    linetype = &quot;dashed&quot;,
    color = &quot;#67001f&quot;,
    linewidth = 0.6
  ) +
  geom_vline(
    xintercept = cen,
    linetype = &quot;dashed&quot;,
    color = &quot;grey20&quot;,
    linewidth = 0.6
  ) +
  annotate(
    &quot;text&quot;,
    x = t_cold,
    y = Inf,
    label = &quot;1st pct&quot;,
    vjust = 1.4,
    hjust = 1.1,
    size = 3.2,
    color = &quot;#053061&quot;
  ) +
  annotate(
    &quot;text&quot;,
    x = t_hot,
    y = Inf,
    label = &quot;99th pct&quot;,
    vjust = 1.4,
    hjust = -0.1,
    size = 3.2,
    color = &quot;#67001f&quot;
  ) +
  annotate(
    &quot;text&quot;,
    x = cen,
    y = Inf,
    label = &quot;MMT&quot;,
    vjust = 1.4,
    hjust = -0.1,
    size = 3.2,
    color = &quot;grey20&quot;
  ) +
  scale_x_continuous(limits = range(cpred$exp_at)) +
  scale_fill_gradientn(
    colours = c(
      &quot;#053061&quot;,
      &quot;#2166ac&quot;,
      &quot;#4393c3&quot;,
      &quot;#92c5de&quot;,
      &quot;#d1e5f0&quot;,
      &quot;#f7f7f7&quot;,
      &quot;#fddbc7&quot;,
      &quot;#f4a582&quot;,
      &quot;#d6604d&quot;,
      &quot;#b2182b&quot;,
      &quot;#67001f&quot;
    ),
    name = &quot;Temperature&quot;
  ) +
  labs(x = &quot;Temperature (°C)&quot;, y = &quot;Density&quot;) +
  theme_minimal() +
  theme(
    plot.margin = margin(20, 8, 8, 8),
    legend.position = &quot;bottom&quot;
  )

# Combine them:
p_main / p_hist + plot_layout(heights = c(3, 1))</pre>
</div>
</div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img loading="lazy" decoding="async" src="https://i1.wp.com/r-posts.com/wp-content/uploads/2026/04/exposure_response.png?w=450" alt="" class="alignnone size-full wp-image-19046" srcset_temp="https://i1.wp.com/r-posts.com/wp-content/uploads/2026/04/exposure_response.png?w=450 1920w, http://r-posts.com/wp-content/uploads/2026/04/exposure_response-375x300.png 375w, http://r-posts.com/wp-content/uploads/2026/04/exposure_response-450x360.png 450w, http://r-posts.com/wp-content/uploads/2026/04/exposure_response-768x614.png 768w, http://r-posts.com/wp-content/uploads/2026/04/exposure_response-1536x1229.png 1536w" sizes="auto, (max-width: 1920px) 100vw, 1920px" data-recalc-dims="1" /><br />
The overall cumulative curve (mustard) is clearly asymmetric: risk increases on both sides of the MMT, but the rise is much steeper for hot temperatures than for cold temperatures. The lag-0 curve (black), which reflects the immediate effect, behaves differently for cold than heat: it is below 1 at cold temperatures (reflecting the delayed nature of cold temperature effects) and increases approximately linearly for heat. The histogram confirms that most London days fall between 5°C and 20°C, so extreme temperatures, despite their high individual risks, are relatively rare events.<br />
<br />
<span style="font-size: 29px;font-weight: bold">Attributable risk</span></p>
</figure>
</div>
</div>
</div>
</section>
<section id="attributable-risk" class="level2">
<p>We can also calculate attributable numbers and fractions from a B-DLNM, which allows to quantify the impact of all the observed exposures in 75+ years mortality. We compute the number of mortality events attributable to the temperature exposures (attributable number) and the fraction of all the mortality events it constitutes (attributable fraction).</p>
<p>Two different perspectives can be used:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>
<p>Backward (<code>dir = &quot;back&quot;</code>): what today’s deaths were explained by past temperature exposures?</p>
</li>
	<li>
<p>Forward (<code>dir = &quot;forw&quot;</code>): what future deaths will today’s temperature exposure cause?</p>
</li>
</ul>
</li>
</ul>
<p>Let’s use the forward perspective, more commonly used:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb17">
<pre>tictoc::tic()
attr_forw &lt;- attributable(
  mod,
  london,
  name_date = &quot;date&quot;,
  name_exposure = &quot;tmean&quot;,
  name_cases = &quot;mort_75plus&quot;,
  cen = cen,
  dir = &quot;forw&quot;
)
tictoc::toc()</pre>
</div>
</div>
<div class="cell-output cell-output-stdout">
<pre>110.12 sec elapsed</pre>
</div>
</div>
</section>
<section id="attributable-fraction-evolution" class="level2">
<h2 class="anchored" data-anchor-id="attributable-fraction-evolution">Attributable fraction evolution</h2>
<p>We can plot the time series of daily attributable fractions (AF):</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb19">
<pre>col_af &lt;- &quot;black&quot;

temp_colours &lt;- c(
  &quot;#053061&quot;,
  &quot;#2166ac&quot;,
  &quot;#4393c3&quot;,
  &quot;#92c5de&quot;,
  &quot;#d1e5f0&quot;,
  &quot;#f7f7f7&quot;,
  &quot;#fddbc7&quot;,
  &quot;#f4a582&quot;,
  &quot;#d6604d&quot;,
  &quot;#b2182b&quot;,
  &quot;#67001f&quot;
)

af_med &lt;- attr_forw$af.summary[, &quot;0.5quant&quot;]

# Pre-compute range once
af_min &lt;- min(af_med, na.rm = TRUE) - 0.01
af_max &lt;- max(af_med, na.rm = TRUE) + 0.01

df &lt;- data.frame(
  date = london$date,
  x = yday(london$date),
  year = year(london$date),
  tmean = london$tmean,
  af = af_med
)

ggplot(df, aes(x = x)) +
  # Full-height temperature background per day
  geom_rect(
    aes(
      xmin = x - 0.5,
      xmax = x + 0.5,
      ymin = af_min,
      ymax = af_max,
      fill = tmean
    )
  ) +
  scale_fill_gradientn(
    colours = temp_colours,
    name = &quot;Temperature (°C)&quot;
  ) +
  # AF line on top
  geom_line(
    aes(y = af),
    color = col_af,
    linewidth = 0.7
  ) +
  scale_y_continuous(
    name = &quot;Attributable Fraction (AF)&quot;,
    breaks = seq(0, 1, by = 0.1),
    limits = c(af_min, af_max),
    expand = c(0, 0)
  ) +
  scale_x_continuous(
    breaks = yday(as.Date(paste0(
      &quot;2000-&quot;,
      c(&quot;01&quot;, &quot;03&quot;, &quot;05&quot;, &quot;07&quot;, &quot;09&quot;, &quot;11&quot;),
      &quot;-01&quot;
    ))),
    labels = c(&quot;Jan&quot;, &quot;Mar&quot;, &quot;May&quot;, &quot;Jul&quot;, &quot;Sep&quot;, &quot;Nov&quot;),
    expand = c(0, 0)
  ) +
  facet_wrap(~year, ncol = 3, axes = &quot;all_x&quot;) +
  labs(x = NULL) +
  theme_minimal(base_size = 11) +
  theme(
    panel.spacing.x = unit(0, &quot;pt&quot;),
    strip.text = element_text(face = &quot;bold&quot;, size = 10),
    legend.position = &quot;top&quot;,
    legend.key.width = unit(2.5, &quot;cm&quot;)
  )</pre>
</div>
</div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img loading="lazy" decoding="async" src="https://i2.wp.com/r-posts.com/wp-content/uploads/2026/04/attr_evolution.png?w=450" alt="" class="alignnone size-full wp-image-19047" srcset_temp="https://i2.wp.com/r-posts.com/wp-content/uploads/2026/04/attr_evolution.png?w=450 1344w, http://r-posts.com/wp-content/uploads/2026/04/attr_evolution-263x300.png 263w, http://r-posts.com/wp-content/uploads/2026/04/attr_evolution-450x514.png 450w, http://r-posts.com/wp-content/uploads/2026/04/attr_evolution-768x878.png 768w" sizes="auto, (max-width: 1344px) 100vw, 1344px" data-recalc-dims="1" /><br />
<br />
Sharp spikes in AF exceeding 60% are visible in summer 2003 and 2006, coinciding with the major European heatwaves. In general, summer episodes produce higher and more abrupt peaks in AF, whereas cold winter days are associated with more sustained elevations over time, though less pronounced in magnitude.</p>
</figure>
</div>
</div>
</div>
</section>
<section id="total-attributable-burden" class="level2">
<h2 class="anchored" data-anchor-id="total-attributable-burden">Total attributable burden</h2>
<p>Summing across the full study period, the table quantifies the total mortality burden attributable to non-optimal temperature exposures in the 75+ population:</p>
<div class="cell">
<div class="code-copy-outer-scaffold">
<div class="sourceCode cell-code" id="cb20">
<pre>rbind(
  &quot;Attributable fraction&quot; = attr_forw$aftotal.summary,
  &quot;Attributable number&quot; = attr_forw$antotal.summary
) |&gt;
  as.data.frame() |&gt;
  round(3) |&gt;
  gt(rownames_to_stub = TRUE)</pre>
</div>
</div>
</div>
<div style="width: 100%">
<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false" style="width: 100%;font-size: 12px;height: 140px">
<thead>
<tr class="gt_col_headings header" style="height: 28px">
<th id="a::stub" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col" style="height: 28px"></th>
<th id="mean" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col" style="height: 28px">mean</th>
<th id="sd" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col" style="height: 28px">sd</th>
<th id="a0.025quant" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col" style="height: 28px">0.025quant</th>
<th id="a0.5quant" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col" style="height: 28px">0.5quant</th>
<th id="a0.975quant" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col" style="height: 28px">0.975quant</th>
<th id="mode" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col" style="height: 28px">mode</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd" style="height: 56px">
<th id="stub_1_1" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row" style="height: 56px">Attributable fraction</th>
<td class="gt_row gt_right" headers="stub_1_1 mean" style="height: 56px">0.174</td>
<td class="gt_row gt_right" headers="stub_1_1 sd" style="height: 56px">0.018</td>
<td class="gt_row gt_right" headers="stub_1_1 0.025quant" style="height: 56px">0.139</td>
<td class="gt_row gt_right" headers="stub_1_1 0.5quant" style="height: 56px">0.175</td>
<td class="gt_row gt_right" headers="stub_1_1 0.975quant" style="height: 56px">0.207</td>
<td class="gt_row gt_right" headers="stub_1_1 mode" style="height: 56px">0.176</td>
</tr>
<tr class="even" style="height: 56px">
<th id="stub_1_2" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row" style="height: 56px">Attributable number</th>
<td class="gt_row gt_right" headers="stub_1_2 mean" style="height: 56px">68857.597</td>
<td class="gt_row gt_right" headers="stub_1_2 sd" style="height: 56px">7131.526</td>
<td class="gt_row gt_right" headers="stub_1_2 0.025quant" style="height: 56px">55071.066</td>
<td class="gt_row gt_right" headers="stub_1_2 0.5quant" style="height: 56px">69178.391</td>
<td class="gt_row gt_right" headers="stub_1_2 0.975quant" style="height: 56px">81995.459</td>
<td class="gt_row gt_right" headers="stub_1_2 mode" style="height: 56px">69842.155</td>
</tr>
</tbody>
</table>
</div>

Over the full 2000-2012 period, approximately 17.5<strong>%</strong> (95% CrI: 13.9%-20.7%) of all deaths in the London 75+ population were attributable to non-optimal temperatures, corresponding to roughly 69,178 deaths (95% CrI: 55,071-81,996).<br />
<br />
</section>
<section id="conclusions" class="level1">
<h1>Conclusions</h1>
<p>The {bdlnm} package provides a powerful and accessible implementation of Bayesian Distributed Lag Non-Linear Models in R. By combining the flexibility of DLNMs with full Bayesian inference via INLA, it enables researchers to better quantify uncertainty and fit complex exposure-lag-response relationships. This makes it a valuable tool for studying the health impacts of climate change and other environmental risks in increasingly data-rich settings.</p>
<p>This framework is not limited to environmental epidemiology. In fact it can be applied to any setting involving time-varying exposures and delayed effects (e.g., market shocks may affect asset prices over several days), making it a powerful and general tool for time series analysis.</p>
<p>Development is ongoing. Upcoming features include:</p>
<ul>
	<li style="list-style-type: none">
<ul>
	<li><strong>Multi-location analyses</strong>: pooling exposure-lag-response curves across different cities or regions within a single model</li>
	<li><strong>Spatial B-DLNMs (SB-DLNM)</strong>: explicitly modelling spatial heterogeneity in the exposure-lag-response curves of different regions</li>
</ul>
</li>
</ul>
<p>The package is on <a href="https://cran.r-project.org/package=bdlnm" rel="nofollow" target="_blank">CRAN</a>. Bug reports and contributions are welcome via <a href="https://github.com/pasahe/bdlnm" rel="nofollow" target="_blank">GitHub</a>.</p>
</section>
<section id="references" class="level1">
<h1>References</h1>
<ul>
	<li style="list-style-type: none">
<ul>
	<li>
<p>Gasparrini A. (2011). Distributed lag linear and non-linear models in R: the package dlnm. <em>Journal of Statistical Software</em>, 43(8), 1-20. <a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/10.18637/jss.v043.i08" class="uri" rel="nofollow" target="_blank">doi:10.18637/jss.v043.i08</a>.</p>
</li>
	<li>
<p>Quijal-Zamorano M., Martinez-Beneito M.A., Ballester J., Marí-Dell’Olmo M. (2024). Spatial Bayesian distributed lag non-linear models (SB-DLNM) for small-area exposure-lag-response epidemiological modelling. <em>International Journal of Epidemiology</em>, 53(3), dyae061. <a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/10.1093/ije/dyae061" class="uri" rel="nofollow" target="_blank">doi:10.1093/ije/dyae061</a>.</p>
</li>
	<li>
<p>Rue H., Martino S., Chopin N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. <em>Journal of the Royal Statistical Society: Series B</em>, 71(2), 319-392. <a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/10.1111/j.1467-9868.2008.00700.x" class="uri" rel="nofollow" target="_blank">doi:10.1111/j.1467-9868.2008.00700.x</a>.</p>
</li>
	<li>
<p>Gasparrini A., Leone M. (2014). Attributable risk from distributed lag models. <em>BMC Medical Research Methodology</em>, 14, 55. <a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/10.1186/1471-2288-14-55" class="uri" rel="nofollow" target="_blank">doi:10.1186/1471-2288-14-55</a>.</p>
</li>
</ul>
</li>
</ul>
</section><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/" rel="nofollow" target="_blank">New R Package {bdlnm} Released on CRAN: Bayesian Distributed Lag Non-Linear Models in R via INLA</a> was first posted on April 10, 2026 at 5:14 pm.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/new-r-package-bdlnm-released-on-cran-bayesian-distributed-lag-non-linear-models-in-r-via-inla/">New R Package {bdlnm} Released on CRAN: Bayesian Distributed Lag Non-Linear Models in R via INLA</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400475</post-id>	</item>
		<item>
		<title>TheseusPlot 0.2.0: Visualizing Decomposition of Differences in Rate Metrics</title>
		<link>https://www.r-bloggers.com/2026/04/theseusplot-0-2-0-visualizing-decomposition-of-differences-in-rate-metrics/</link>
		
		<dc:creator><![CDATA[Koji Makiyama]]></dc:creator>
		<pubDate>Fri, 10 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://hoxo-m.github.io/blog/posts/TheseusPlot-0-2-0/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>TheseusPlot is an R package that decomposes differences in rate metrics between two groups into contributions from individual subgroups and visualizes the results as a “Theseus Plot”.<br />
The package is inspired by the Ship of Theseus thought experi...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/theseusplot-0-2-0-visualizing-decomposition-of-differences-in-rate-metrics/">TheseusPlot 0.2.0: Visualizing Decomposition of Differences in Rate Metrics</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://hoxo-m.github.io/blog/posts/TheseusPlot-0-2-0/"> HOXO-M Blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p><strong>TheseusPlot</strong> is an R package that decomposes differences in rate metrics between two groups into contributions from individual subgroups and visualizes the results as a “Theseus Plot”.</p>
<p>The package is inspired by the <a href="https://en.wikipedia.org/wiki/Ship_of_Theseus" rel="nofollow" target="_blank">Ship of Theseus</a> thought experiment. It replaces subgroup data step by step, recalculates the overall metric at each step, and interprets each change as that subgroup’s contribution to the overall difference.</p>
<p>Suppose you notice that the click-through rate is lower in 2025 than in 2024 and want to examine how a particular attribute, such as gender, contributed to the change. If you obtain a Theseus Plot like the one below, it suggests that men contributed more to the decline in click-through rate than women.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i1.wp.com/hoxo-m.github.io/blog/posts/TheseusPlot-0-2-0/man/figures/README-overview-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<section id="whats-new-in-0.2.0" class="level2">
<h2 class="anchored" data-anchor-id="whats-new-in-0.2.0">What’s new in 0.2.0</h2>
<p>Version 0.2.0 includes the following changes:</p>
<ul>
<li>a fix for continuous-variable discretization with <code>split = &quot;rate&quot;</code>, where bin boundaries for the second group could previously be computed from the first group’s data</li>
<li>a fix for the size bar of <code>&quot;Sum of ... other attributes&quot;</code>, which could incorrectly use the first group’s counts for both groups</li>
<li>a fix for warnings in <code>plot()</code> and <code>plot_flip()</code> when multiple subgroups were tied for the largest absolute contribution</li>
<li>suppression of warnings generated internally by <code>waterfalls::waterfall()</code> during plot creation.</li>
</ul>
<p>I would like to thank Kazuyuki Sano for reporting the first two issues and contributing to their fixes.</p>
</section>
<section id="installation" class="level2">
<h2 class="anchored" data-anchor-id="installation">Installation</h2>
<p>You can install <strong>TheseusPlot</strong> from CRAN with:</p>
<pre>install.packages(&quot;TheseusPlot&quot;)</pre>
</section>
<section id="try-it-out" class="level2">
<h2 class="anchored" data-anchor-id="try-it-out">Try it out</h2>
<p><strong>TheseusPlot</strong> may be useful when you want to understand why metrics such as conversion rate, retention rate, or click-through rate changed.</p>
<p>For details on how to use it, please see the package website: <a href="https://hoxo-m.github.io/TheseusPlot/" rel="nofollow" target="_blank">https://hoxo-m.github.io/TheseusPlot/</a>.</p>


</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://hoxo-m.github.io/blog/posts/TheseusPlot-0-2-0/"> HOXO-M Blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/theseusplot-0-2-0-visualizing-decomposition-of-differences-in-rate-metrics/">TheseusPlot 0.2.0: Visualizing Decomposition of Differences in Rate Metrics</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400577</post-id>	</item>
		<item>
		<title>Using R to Teach R: Lessons for Software Development</title>
		<link>https://www.r-bloggers.com/2026/04/using-r-to-teach-r-lessons-for-software-development/</link>
		
		<dc:creator><![CDATA[The Jumping Rivers Blog]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 23:59:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>As we approach the decennial (10-year) anniversary since Jumping Rivers was founded in 2016, it’s a good time to reflect on what we have achieved in that time and share some lessons learned.<br />
If you have read our blogs previously then you will be aware that Jumping Rivers is a ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/using-r-to-teach-r-lessons-for-software-development/">Using R to Teach R: Lessons for Software Development</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/"> The Jumping Rivers Blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>
<a href = "https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/">
<img src="https://i1.wp.com/www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/featured.png?w=400&#038;ssl=1" style="width:400px" class="image-center" style="display: block; margin: auto;" data-recalc-dims="1" />
</a>
</p>
<p>As we approach the decennial (10-year) anniversary since Jumping Rivers was founded in 2016, it’s a good time to reflect on what we have achieved in that time and share some lessons learned.</p>
<p>If you have read our blogs previously then you will be aware that Jumping Rivers is a consultancy and training provider in all things data science. But did you know that we offer over 50 different courses spanning R, Python, Git, SQL and more?</p>
<p>In this blog we will provide a glimpse into our internal process and share how we have streamlined the task of maintaining so many courses. Along the way we will share some good practices applicable to any big coding project, including packaging of source code and automated CI/CD.</p>
<aside class="advert">
<p>
Whether you want to start from scratch, or improve your skills, <a href="https://www.jumpingrivers.com/training/?utm_source=blog&#038;utm_medium=banner&#038;utm_campaign=2026-teaching-r-packages-reporting-gitlab" rel="nofollow" target="_blank">Jumping Rivers has a training course for you</a>.
</p>
</aside>
<h2 id="the-challenge">The challenge</h2>
<p>Let’s start by laying out the key challenges which face us.</p>
<h3 id="1-multilingual-support">1. Multilingual support</h3>
<p>Our <a href="https://www.jumpingrivers.com/training/all-courses/" rel="nofollow" target="_blank">course catalogue</a> consists of over 50 courses. The majority of these are either based on R or Python or both:</p>
<ul>
<li>50% R</li>
<li>30% Python</li>
<li>5% R and Python</li>
<li>15% other (Git, SQL, Tableau, Posit and more)</li>
</ul>
<p>At the very least, any solution that we come up with for standardising our courses must be compatible with both R and Python. Ideally it should also support some less taught languages including SQL and Git.</p>
<h3 id="2-maintenance">2. Maintenance</h3>
<p>The world of R and Python is constantly changing. The languages themselves receive frequent updates, as do publicly available R packages on <a href="https://cran.r-project.org/" rel="nofollow" target="_blank">CRAN</a> and Python packages on <a href="https://pypi.org/" rel="nofollow" target="_blank">PyPI</a>.</p>
<p>This has the consequence that code which worked one year ago (or even one day) may no longer be functional with the latest package versions. We will need some way to track this and ensure that the code examples covered in our courses remain relevant and error-free.</p>
<h3 id="3-demand">3. Demand</h3>
<p>We deliver over 100 courses per year. For a relatively small team of data scientists, this can be a lot to juggle!</p>
<p>In an ideal world, the process of building the course materials, setting up the cloud environment for training, and managing all of the administration that goes along with this should be automated. That way, the trainer can focus on providing the highest quality experience for the attendees without having to worry about things going wrong on the day.</p>
<h2 id="the-solution">The solution</h2>
<p>Our team is used to setting up data science workflows for clients, including automated reporting and migration of source code into packages. We have therefore applied these techniques in our internal processes, including training.</p>
<h3 id="automated-reporting">Automated reporting</h3>
<p>You write a document which has to be updated on a regular basis; this might include a monthly presentation showing the latest company revenues. Does this scenario sound familiar?</p>
<p>We <em>could</em> regenerate the plots and data tables and manually copy and paste these into the report document. Even better, we can take advantage of free-to-use automated reporting frameworks including <a href="https://rmarkdown.rstudio.com/" rel="nofollow" target="_blank">R Markdown</a> and <a href="https://quarto.org/" rel="nofollow" target="_blank">Quarto</a>.</p>
<p>R Markdown and Quarto both work as follows:</p>
<ul>
<li>
<p>We provide a “YAML header” at the top of the report document with configuration and formatting options:</p>
<pre>---
title: &quot;Introduction to Python&quot;
authors:
- &quot;Myles Mitchell&quot;
date: &quot;2026-04-02&quot;
output: pdf
---
</pre></li>
<li>
<p>The report body is formatted as Markdown and supports a mixture of plain text and code:</p>
<pre>## Introduction
At it&#39;s most basic, Python is essentially a calculator.
We can run basic calculations as follows:
```{python}
2 + 1
```
We can also assign the output of a calculation to a
variable so that it can be reused later:
```{python}
x = 2 + 1
print(x)
```
</pre></li>
</ul>
<p>Notice that we have included chunks of Python code. By making use of <em>chunk options</em> we can configure code chunks to be executed when rendering the report. Any outputs from the code (plots, tables, summary statistics) can then be displayed.</p>
<p>By migrating the code logic into the report itself, we can update our report assets at the click of a button whenever the data changes.</p>
<p>We have taken inspiration from this approach with our course notes and presentation slides. This forces us to be rigorous with the code examples. Any runtime errors that are produced by faulty or outdated code would be visible in the course notes and by extension to the attendees of our courses.</p>
<p>Crucially for us, R Markdown and Quarto are both compatible with R and Python. They also support syntax highlighting for languages like Git and SQL, as well as a variety of output formats including HTML and PDF.</p>
<img src="https://i2.wp.com/www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/quarto-flow-chart.png?w=578&#038;ssl=1" alt="Flow chart illustrating the automated reporting workflow with Quarto. Starting with a text-based .qmd file, this is converted into a Markdown format using Jupyter or knitr. Pandoc is then used to convert this into a variety of output formats including HTML, PDF and Word." data-recalc-dims="1" />
<h3 id="internal-r-packages">Internal R packages</h3>
<p>So we have settled on a solution for building our course notes. But we have 50 different courses, and setting these up from scratch each time is going to get tedious!</p>
<p>A good practice in any coding project is to avoid duplication as much as possible. Instead of copying and pasting code, we should really be migrating code into functions which are self contained, reusable and easy to test. This will mean fewer places to debug when things inevitably go wrong.</p>
<p>Following a similar philosophy for our training infrastructure, we have migrated any reusable assets for our courses—including logos, template files and styling—into a collection of internal R packages.</p>
<p>When building a new course, the developer can now focus on the aspects that are unique to that course:</p>
<ul>
<li>Code examples</li>
<li>Notes</li>
<li>Exercises</li>
<li>Presentation slides</li>
</ul>
<p>Everything else is taken care of automatically:</p>
<ul>
<li>The appearance of the course notes and presentation slides.</li>
<li>Build routines including converting the R Markdown / Quarto text files into HTML.</li>
</ul>
<p>In addition to course templates, we also have internal packages for managing the administrative side of training, including:</p>
<ul>
<li>Calculating pricing quotes for clients.</li>
<li>Generating post-course certificates.</li>
<li>Spinning up a bespoke <a href="https://posit.co/products/enterprise/workbench/" rel="nofollow" target="_blank">Posit Workbench</a> environment for the course.</li>
<li>Summarising attendee feedback.</li>
</ul>
<p>And the list goes on!</p>
<h3 id="gitlab-cicd">GitLab CI/CD</h3>
<p>With automated reporting and packaging of source code, we have created standardised routines that can be applied to any of our courses.</p>
<p>This does not change the fact that we have over 50 courses to maintain. We still need a way of testing our courses and tracking issues. This is where CI/CD (Continuous Integration / Continuous Development and Deployment) comes in.</p>
<p>CI/CD defines a framework for software development, including:</p>
<ul>
<li>Automated unit testing.</li>
<li>Branching of source code and code review.</li>
<li>Versioning and deployment of software.</li>
</ul>
<p>If you maintain software then you have likely come across version control with Git. Cloud platforms like <a href="https://gitlab.com/" rel="nofollow" target="_blank">GitLab</a> and <a href="https://github.com/" rel="nofollow" target="_blank">GitHub</a> provide tools for collaborative code development. Not only do they provide a cloud backup of your source code, they also provide the following features:</p>
<ul>
<li>CI/CD tools for automated testing, build and deployment.</li>
<li>Branch rules for enforcing good practices like code review and unit testing.</li>
<li>Versioning and tagging of source code.</li>
</ul>
<p>Each of our courses is maintained via it’s own GitLab repository. The CI/CD pipelines for our courses are defined in a separate repository along with the internal R packages mentioned above.</p>
<img src="https://i0.wp.com/www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/standardisation.png?w=578&#038;ssl=1" alt="Flow chart illustrating how we have standardised our GitLab training repositories. The templates are defined in a central repository and pushed downstream to our course repositories." data-recalc-dims="1" />
<p>When setting up a new course, the course repository will be automatically populated with the template CI/CD rules. All courses are therefore subject to the same stringent checks, including:</p>
<ul>
<li>Ensuring that the course notes build without errors.</li>
<li>Enforcing code review of any course updates before these are merged into the main branch.</li>
<li>Building and storing the <em>artifacts</em> (the rendered HTML notes and coding scripts) for the latest version of the course.</li>
</ul>
<p>These checks are triggered by any updates to a course. We also schedule monthly CI/CD pipelines for all courses, with any issues immediately flagged to our trainers.</p>
<p>We have also taken advantage of GitLab’s folder-like structure for organising code repositories. Within the Jumping Rivers project on GitLab, we have a subproject called “training”. All of our course-related repositories are located “downstream” from this project. This means that any settings or environment variables defined at the “training” level are automatically applied to all of our courses.</p>
<h2 id="in-summary">In summary</h2>
<p>The take-home lessons from this blog are applicable to any big coding project:</p>
<ul>
<li>Avoid duplication: migrate any reusable logic or assets into standalone packages.</li>
<li>Utilise CI/CD workflows using GitLab, GitHub or similar.</li>
<li>Focus on what matters by automating as much of the process as possible.</li>
</ul>
<p>Our training infrastructure has taken 10 years to build and is still constantly evolving; we have not even covered the full process in this blog! For a deeper dive, check out this <a href="https://youtu.be/MD0F3ChgqBE?si=EFSHE6MOqgU5I9UM" rel="nofollow" target="_blank">talk</a> by Myles at SatRdays London 2024.</p>
<p>For more on automated reporting, check out:</p>
<ul>
<li><a href="https://www.jumpingrivers.com/blog/quarto-for-python-users/" rel="nofollow" target="_blank">Quarto for the Python user</a>.</li>
<li><a href="https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/" rel="nofollow" target="_blank">Parameterised presentations with Quarto</a>.</li>
</ul>
<p>For more on packaging of source code, check out:</p>
<ul>
<li><a href="https://www.jumpingrivers.com/blog/personal-r-package/" rel="nofollow" target="_blank">Writing a personal R package</a>.</li>
<li>Three-part series: <a href="https://www.jumpingrivers.com/blog/?search=creating+a+python+package" rel="nofollow" target="_blank">Creating a Python package</a>.</li>
<li>Four-part series: <a href="https://www.jumpingrivers.com/blog/?search=r+package+quality" rel="nofollow" target="_blank">R package quality</a>.</li>
</ul>
<p>
For updates and revisions to this article, see the <a href = "https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/">original post</a>
</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/"> The Jumping Rivers Blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/using-r-to-teach-r-lessons-for-software-development/">Using R to Teach R: Lessons for Software Development</a>]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">400414</post-id>	</item>
		<item>
		<title>Hold On Hope: publication lag times at cell biology journals</title>
		<link>https://www.r-bloggers.com/2026/04/hold-on-hope-publication-lag-times-at-cell-biology-journals/</link>
		
		<dc:creator><![CDATA[Stephen Royle]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 11:22:53 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://quantixed.org/?p=3723</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> I’ve posted about publication lag times previously. The “lag” refers to the time from submitting a paper and it appearing in a journal. Publication lag times are still a frustration for researchers. Although preprints circumvent the delay in sharing science with others, publication is still king when it comes ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/hold-on-hope-publication-lag-times-at-cell-biology-journals/">Hold On Hope: publication lag times at cell biology journals</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://quantixed.org/2026/04/09/hold-on-hope-publication-lag-times-at-cell-biology-journals/"> Rstats – quantixed</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>I’ve posted about publication <a href="https://quantixed.org/tag/lag-times/" rel="nofollow" target="_blank">lag times</a> previously. The “lag” refers to the time from submitting a paper and it appearing in a journal.</p>



<p>Publication lag times are still a frustration for researchers. Although preprints circumvent the delay in sharing science with others, publication is still king when it comes to evaluation. Contracts are short and publication delays can be long…</p>



<p>I recently saw a post comparing <a href="https://sashagusev.github.io/Genetics_Pub_Dates/" rel="nofollow" target="_blank">median publication lag times for genetics journals</a>. This motivated me to update my code and rerun the analysis for cell biology journals to see what, if anything, has changed.</p>



<h2 class="wp-block-heading">Methodology</h2>



<p>I wrote an R package <a href="https://github.com/quantixed/PubMedLagR" rel="nofollow" target="_blank">PubMedLagR</a> which uses <code>{rentrez}</code> to retrieve the publication data from PubMed. Once we have this data, it is a matter of producing some plots which I have covered <a href="https://quantixed.org/2021/04/04/ten-years-vs-the-spread-ii-calculating-publication-lag-times-in-r/" data-type="post" data-id="2369" rel="nofollow" target="_blank">previously</a>. To ensure that we are only looking at research papers, we use <code>&quot;journal article&quot;[pt]</code> as a search term, and also remove from the data anything else (Reviews, Commentaries etc.). Feel free to use it to look at other journals.</p>



<h2 class="wp-block-heading">Caveats</h2>



<p>Before we get started, there are some caveats. The analysis is only as good as the PubMed data. Not all journals submit their date information to PubMed, while for others it is incomplete (as we’ll see below). There are inaccuracies for sure. I found a paper that was supposedly submitted on 1970-01-01, more than 40 years before the journal started. Also, it’s well known that some journals “restart the clock” on a paper by rejecting it and allowing resubmission, then only counting the resubmitted version. So, comparison between journals is a little tricky, but it allows us to look at trends.</p>



<h2 class="wp-block-heading">Let’s dive in</h2>



<p>We can use the following code to grab the data we are interested in.</p>


<pre>
library(PubMedLagR)
jrnl_list &lt;- c(&quot;J Cell Sci&quot;, &quot;Mol Biol Cell&quot;, &quot;J Cell Biol&quot;, &quot;Nat Cell Biol&quot;,
               &quot;EMBO J&quot;, &quot;Biochem J&quot;, &quot;Dev Cell&quot;, &quot;FASEB J&quot;, &quot;J Biol Chem&quot;,
               &quot;Cells&quot;, &quot;Front Cell Dev Biol&quot;, &quot;Nature Communications&quot;,
               &quot;Cell Reports&quot;, &quot;Mol Cell&quot;, &quot;Autophagy&quot;, &quot;Cell Death Differ&quot;,
               &quot;Cell Death Dis&quot;, &quot;Cell Res&quot;, &quot;Sci Adv&quot;, &quot;Cell&quot;)
yrs &lt;- 2006:2026
retrieve_journal_year_records(jrnl_list, yrs, batch_size = 250)
pprs &lt;- pubmed_xmls_to_df()
</pre>


<p>The list of journals is somewhat arbitrary. I have included Nature Communications and Science Advances although they carry many other papers besides cell biology. I included Cell (even though there’s not much cell biology in there these days) and left out Nature and Science.</p>



<p><strong>How many papers are in each of these journals and how has that changed over time?</strong></p>



<figure data-wp-context="{"imageId":"69d7ae27af126"}" data-wp-interactive="core/image" data-wp-key="69d7ae27af126" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" fetchpriority="high" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/journals_facet-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3724" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/journals_facet-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/journals_facet-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/journals_facet-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/journals_facet-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/journals_facet-2048x1170.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>There’s a huge increase in the number of papers published by Nature Communications, Science Advances and Cell Reports. Nature Communications is a behemoth, publishing ~12,500 papers in 2025.</p>



<p>There was a boom and bust in publications at Front Cell Dev Biol and Cells which could be due to reputational problems (like <a href="https://arstechnica.com/science/2024/02/scientists-aghast-at-bizarre-ai-rat-with-huge-genitals-in-peer-reviewed-article/" rel="nofollow" target="_blank">this</a>). Other journals have declined. Most noticeably J Biol Chem, but others have taken a hit. The reasons behind these dynamics are discussed <a href="https://doi.org/10.1371/journal.pbio.3002234" rel="nofollow" target="_blank">elsewhere</a>.</p>



<h2 class="wp-block-heading">Median publication lag time</h2>



<p>We’ll use received-to-published as our measure of publication lag time. This is the time from submission to it appearing in the journal “in print”. It’s measured here in days.</p>



<figure data-wp-context="{"imageId":"69d7ae27af794"}" data-wp-interactive="core/image" data-wp-key="69d7ae27af794" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_facet-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3725" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_facet-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_facet-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_facet-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_facet-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_facet-2048x1170.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>For the last 5 years, the median lag time at Nature Cell Biology is over one year. Other outlets are close to this, e.g. Cell, Dev Cell; whereas others linger around 200 days, or lower in the case of J Cell Sci, FASEB J et al. The journal Autophagy is missing here because there are no data for it. Others, like Sci Adv, MBoC have only minimal data available. The shortest lag times were for Cells (which might not surprise some people).</p>



<p>The lion’s share of this lag time is the time from submission to acceptance (received-accepted), with a small contribution from the time taken to formally publish the article (accepted-published).</p>



<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex">
<figure data-wp-context="{"imageId":"69d7ae27afcd1"}" data-wp-interactive="core/image" data-wp-key="69d7ae27afcd1" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3726" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_acc_pub_facet-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3726" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_acc_pub_facet-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/lag_acc_pub_facet-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/lag_acc_pub_facet-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/lag_acc_pub_facet-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/lag_acc_pub_facet-2048x1170.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<figure data-wp-context="{"imageId":"69d7ae27b016a"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b016a" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3727" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_rec_acc_facet-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3727" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_rec_acc_facet-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_acc_facet-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_acc_facet-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_acc_facet-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_acc_facet-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</figure>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td>Journal</td><td>Year</td><td>Accepted-Published</td><td>Recieved-Published</td><td>Recieved-Accepted</td></tr><tr><td>Biochem J</td><td>2025</td><td>6</td><td>125</td><td>116</td></tr><tr><td>Cell</td><td>2025</td><td>29</td><td>308.5</td><td>275.5</td></tr><tr><td>Cell Death Differ</td><td>2025</td><td>13</td><td>235</td><td>221</td></tr><tr><td>Cell Death Dis</td><td>2025</td><td>18</td><td>218</td><td>193</td></tr><tr><td>Cell Rep</td><td>2025</td><td>23</td><td>233</td><td>209</td></tr><tr><td>Cell Res</td><td>2025</td><td>34</td><td>208.5</td><td>172</td></tr><tr><td>Cells</td><td>2025</td><td>16</td><td>57</td><td>41</td></tr><tr><td>Dev Cell</td><td>2025</td><td>27</td><td>322</td><td>296</td></tr><tr><td>EMBO J</td><td>2025</td><td>27</td><td>222</td><td>196</td></tr><tr><td>FASEB J</td><td>2025</td><td>13</td><td>131</td><td>117</td></tr><tr><td>Front Cell Dev Biol</td><td>2025</td><td>35</td><td>111</td><td>71</td></tr><tr><td>J Biol Chem</td><td>2025</td><td>9</td><td>128</td><td>117</td></tr><tr><td>J Cell Biol</td><td>2025</td><td>35</td><td>259</td><td>218</td></tr><tr><td>J Cell Sci</td><td>2025</td><td>13</td><td>168.5</td><td>155</td></tr><tr><td>Mol Cell</td><td>2025</td><td>27</td><td>258</td><td>231</td></tr><tr><td>Nat Cell Biol</td><td>2025</td><td>48</td><td>381.5</td><td>327</td></tr><tr><td>Nat Commun</td><td>2025</td><td>20</td><td>266</td><td>241</td></tr></tbody></table></figure>



<p>This is using the median time. Obviously, some papers whizz straight in, whereas others… don’t. Let’s have a look.</p>



<figure data-wp-context="{"imageId":"69d7ae27b0e23"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b0e23" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_scatter_scale-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3728" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_scatter_scale-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_scatter_scale-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_scatter_scale-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_scatter_scale-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/lag_rec_pub_scatter_scale-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>You can see many examples of papers that have lag times of 1, 2 or 3 years. I scaled all the plots to a maximum lag time of 3 years. There were several examples of papers with lag times of up to 7 years that looked to be genuine, but they distorted the view.</p>



<p><strong>So what trends can we see?</strong></p>



<p>The lag times are creeping up at some journals, but not at others. It’s possible to see some very short lag times (in amongst the longer ones) recently at Dev Cell, Mol Cell and EMBO J. I assume these are transfers in to the journal, leading to rapid publication. Although Cell also has many of those too, so perhaps there are other explanations.</p>



<p>To click though in more detail, I made some graphics for each journal which use ridgelines to look at the profile of lag times for each year.</p>



<figure class="wp-block-gallery has-nested-images columns-4 is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex">
<figure data-wp-context="{"imageId":"69d7ae27b1420"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b1420" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3729" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/BiochemJ_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3729" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/BiochemJ_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/BiochemJ_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/BiochemJ_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/BiochemJ_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/BiochemJ_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Biochem J</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b189e"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b189e" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3734" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/Cell_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3734" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/Cell_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/Cell_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/Cell_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/Cell_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/Cell_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Cell</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b1d82"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b1d82" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3731" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/CellDeathDiffer_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3731" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/CellDeathDiffer_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDiffer_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDiffer_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDiffer_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDiffer_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Cell Death Differ</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b222e"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b222e" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3730" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/CellDeathDis_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3730" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/CellDeathDis_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDis_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDis_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDis_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/CellDeathDis_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Cell Death Dis</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b26bb"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b26bb" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3735" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/CellRep_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3735" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/CellRep_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/CellRep_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/CellRep_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/CellRep_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/CellRep_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Cell Rep</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b2b0d"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b2b0d" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3732" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/CellRes_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3732" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/CellRes_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/CellRes_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/CellRes_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/CellRes_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/CellRes_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Cell Res</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b2fcb"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b2fcb" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3733" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/Cells_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3733" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/Cells_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/Cells_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/Cells_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/Cells_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/Cells_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Cells</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b3448"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b3448" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3736" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/DevCell_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3736" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/DevCell_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/DevCell_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/DevCell_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/DevCell_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/DevCell_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Dev Cell</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b38ef"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b38ef" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3737" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/EMBOJ_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3737" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/EMBOJ_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/EMBOJ_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/EMBOJ_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/EMBOJ_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/EMBOJ_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">EMBO J</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b3d5a"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b3d5a" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3738" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/FASEBJ_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3738" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/FASEBJ_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/FASEBJ_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/FASEBJ_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/FASEBJ_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/FASEBJ_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">FASEB J</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b4222"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b4222" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3739" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/FrontCellDevBiol_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3739" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/FrontCellDevBiol_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/FrontCellDevBiol_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/FrontCellDevBiol_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/FrontCellDevBiol_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/FrontCellDevBiol_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Front Cell Dev Biol</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b469f"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b469f" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3741" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/JBiolChem_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3741" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/JBiolChem_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/JBiolChem_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/JBiolChem_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/JBiolChem_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/JBiolChem_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">J Biol Chem</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b4aef"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b4aef" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3740" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/JCellBiol_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3740" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/JCellBiol_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/JCellBiol_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/JCellBiol_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/JCellBiol_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/JCellBiol_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">J Cell Biol</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b4f41"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b4f41" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3743" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/JCellSci_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3743" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/JCellSci_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/JCellSci_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/JCellSci_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/JCellSci_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/JCellSci_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">J Cell Sci</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b5446"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b5446" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3742" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/MolBiolCell_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3742" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/MolBiolCell_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/MolBiolCell_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/MolBiolCell_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/MolBiolCell_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/MolBiolCell_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Mol Biol Cell</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b5896"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b5896" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3744" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/MolCell_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3744" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/MolCell_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/MolCell_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/MolCell_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/MolCell_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/MolCell_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Mol Cell</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b5d7a"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b5d7a" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3745" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/NatCellBiol_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3745" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/NatCellBiol_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/NatCellBiol_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/NatCellBiol_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/NatCellBiol_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/NatCellBiol_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Nat Cell Biol</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b61f9"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b61f9" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3746" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/NatCommun_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3746" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/04/NatCommun_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/NatCommun_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/NatCommun_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/NatCommun_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/NatCommun_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Nat Commun</figcaption></figure>



<figure data-wp-context="{"imageId":"69d7ae27b66ae"}" data-wp-interactive="core/image" data-wp-key="69d7ae27b66ae" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" data-id="3747" src="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/SciAdv_lag_plots-1024x585.png?w=450&#038;ssl=1" alt="" class="wp-image-3747" srcset_temp="https://i0.wp.com/quantixed.org/wp-content/uploads/2026/04/SciAdv_lag_plots-1024x585.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/SciAdv_lag_plots-300x171.png 300w, https://quantixed.org/wp-content/uploads/2026/04/SciAdv_lag_plots-768x439.png 768w, https://quantixed.org/wp-content/uploads/2026/04/SciAdv_lag_plots-1536x878.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/SciAdv_lag_plots-2048x1170.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">Sci Adv</figcaption></figure>
</figure>



<h2 class="wp-block-heading">And finally</h2>



<p>Incidentally, nine of the top ten papers with the longest lag times were published in Nature Communications. The longest was this one (3263 days). Almost nine years! All papers have their battle stories and I’m sure this one has a tale to tell.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/image-1024x699.png?w=450&#038;ssl=1" alt="" class="wp-image-3748" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/04/image-1024x699.png?w=450&#038;ssl=1 1024w, https://quantixed.org/wp-content/uploads/2026/04/image-300x205.png 300w, https://quantixed.org/wp-content/uploads/2026/04/image-768x525.png 768w, https://quantixed.org/wp-content/uploads/2026/04/image-1536x1049.png 1536w, https://quantixed.org/wp-content/uploads/2026/04/image.png 1546w" sizes="auto, (max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /></figure>



<p>—</p>



<p>The post title comes from “Hold On Hope” by Guided by Voices.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://quantixed.org/2026/04/09/hold-on-hope-publication-lag-times-at-cell-biology-journals/"> Rstats – quantixed</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/hold-on-hope-publication-lag-times-at-cell-biology-journals/">Hold On Hope: publication lag times at cell biology journals</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400426</post-id>	</item>
		<item>
		<title>Developer Engagement and Bioconductor</title>
		<link>https://www.r-bloggers.com/2026/04/developer-engagement-and-bioconductor/</link>
		
		<dc:creator><![CDATA[Nicholas Cooley, PhD]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bioconductor.org/posts/2026-04-09-developer-engagement/</guid>

					<description><![CDATA[<p>Introduction<br />
During the Chan Zuckerberg Institute’s Essential Open Source Software for Science cycle 6 funding round, the Bioconductor Community Manager, Maria Doyle, secured a grant to fund a developer engagement position for Bioconductor, and...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/developer-engagement-and-bioconductor/">Developer Engagement and Bioconductor</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bioconductor.org/posts/2026-04-09-developer-engagement/"> Bioconductor community blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>During the Chan Zuckerberg Institute’s <a href="https://blog.bioconductor.org/posts/2024-07-12-czi-eoss6-grants/" rel="nofollow" target="_blank">Essential Open Source Software for Science</a> cycle 6 funding round, the Bioconductor Community Manager, Maria Doyle, secured a grant to fund a developer engagement position for Bioconductor, and I was fortunate enough to be offered that role. I am Nick Cooley, and I’m excited to see what this role can bring to Bioconductor. My background is relatively diverse, I received my PhD in organic chemistry from the University of Missouri, and I worked on prokaryotic genomics and functional genomics at the University of Pittsburgh from 2017 to 2025.</p>
</section>
<section id="role-responsibilities" class="level2">
<h2 class="anchored" data-anchor-id="role-responsibilities">Role responsibilities</h2>
<p>The mandate of this role is somewhat broad. Bioconductor, and academic computing generally face a myriad of distinct and interrelated challenges as hardware, computing paradigms, and education environments change rapidly. Improving developer resources for tackling new and existing challenges, modernizing Bioconductor developer onboarding materials (particularly for early career researchers), and improving recognition mechanisms for community members who volunteer time and effort to the Bioconductor project are all general themes within the role scope.</p>
</section>
<section id="some-specific-efforts" class="level2">
<h2 class="anchored" data-anchor-id="some-specific-efforts">Some Specific Efforts</h2>
<p>A few of the specific efforts I’ll be working on in this role include:</p>
<section id="developer-forum" class="level3">
<h3 class="anchored" data-anchor-id="developer-forum">Developer Forum</h3>
<p>The <a href="https://bioconductor.org/developers/developers-forum/" rel="nofollow" target="_blank">Developer Forum</a> had previously been run on a volunteer basis, and served as a community resource for discussing technical and infrastructure issues, concerns, and opportunities. The creation of the Developer Engagement Lead allowed us include the Forum as direct responsibility of this role.</p>
</section>
<section id="developer-champions-program" class="level3">
<h3 class="anchored" data-anchor-id="developer-champions-program">Developer Champions Program</h3>
<p><a href="https://workinggroups.bioconductor.org/" rel="nofollow" target="_blank">Bioconductor working groups</a> have been a pillar of Bioconductor for a while, and represent a considerable amount of volunteer work towards the project. Improving the visibility of the working groups themselves, and the recognition that project contributors receive for their participation in the working groups can go a long way towards ensuring that that work is valued by contributors home institutions and funding mechanisms. The Champions Program aims to create a clear recognition mechanism for those volunteer efforts.</p>
</section>
<section id="bioconductor-hackathon-events" class="level3">
<h3 class="anchored" data-anchor-id="bioconductor-hackathon-events">Bioconductor hackathon events</h3>
<p>Community and collaboration are irreplaceable engines of strong research. Many Bioconductor contributors find community and collaboration within their own disciplines or institutions. Providing an avenue for collaborative and technical events within Bioconductor can fill persistent gaps in the the research tooling present in the project, and present networking opportunities for early career researchers. Part of this role is <a href="https://bioconductor.org/developers/bioccommits/" rel="nofollow" target="_blank">planning and running these events</a>.</p>
</section>
<section id="bioconductor-documentation-and-llms" class="level3">
<h3 class="anchored" data-anchor-id="bioconductor-documentation-and-llms">Bioconductor documentation and LLMs</h3>
<p>The ways that researchers search for information, tools, and workflow examples are changing with the rise of large language models and their interfaces. There are opportunities for improving how bioinformaticians, especially those outside of the Bioconductor community, find and familiarize themselves with research solutions within the Bioconductor project, including through improvements to website search and documentation discoverability. A long term goal of this role is to work on documentation templates and checking tools to improve their searchability by LLMs, and explore the feasibility of Bioconductor sanctioned and managed LLMs.</p>
</section>
</section>
<section id="how-to-get-in-touch" class="level2">
<h2 class="anchored" data-anchor-id="how-to-get-in-touch">How to get in touch</h2>
<p>For developer discussions and ideas, the <a href="https://chat.bioconductor.org/" rel="nofollow" target="_blank">Bioconductor Zulip</a> is the best place to connect.</p>


</section>

<p>
© 2026 Bioconductor. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bioconductor.org/posts/2026-04-09-developer-engagement/"> Bioconductor community blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/developer-engagement-and-bioconductor/">Developer Engagement and Bioconductor</a>]]></content:encoded>
					
		
		<enclosure url="https://blog.bioconductor.org/posts/2026-04-09-developer-engagement/featured-image.jpeg" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">400428</post-id>	</item>
		<item>
		<title>Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</title>
		<link>https://www.r-bloggers.com/2026/04/collaborating-between-bioconductor-and-r-universe-on-development-of-common-infrastructure-2/</link>
		
		<dc:creator><![CDATA[The rOpenSci Team]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bioconductor.org/posts/2026-04-08-r-universe-collaboration/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>This article is cross-posted on rOpenSci and R-Consortium blogs.<br />
For more than two decades, the Bioconductor project has been a cornerstone of the R ecosystem, providing high-quality, peer-reviewed tools for bioinformatics and computational biol...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/collaborating-between-bioconductor-and-r-universe-on-development-of-common-infrastructure-2/">Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bioconductor.org/posts/2026-04-08-r-universe-collaboration/"> Bioconductor community blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p><small><i>This article is cross-posted on <a href="https://ropensci.org/blog/" rel="nofollow" target="_blank">rOpenSci</a> and <a href="https://r-consortium.org/blog/" rel="nofollow" target="_blank">R-Consortium</a> blogs.</i></small></p>
<p>For more than two decades, the <a href="https://www.bioconductor.org/" rel="nofollow" target="_blank">Bioconductor project</a> has been a cornerstone of the R ecosystem, providing high-quality, peer-reviewed tools for bioinformatics and computational biology. Its curated repository model, rigorous review standards, and tightly coordinated release process have helped establish Bioconductor as one of the most trusted distribution channels in scientific computing.</p>
<p>However, the infrastructure that supports such a long-standing and large-scale project inevitably accumulates technical debt. Legacy build systems, bespoke tooling, and historically grown workflows add up to costly and unsustainable maintenance work. For this reason, Bioconductor is collaborating with <a href="https://r-universe.dev/" rel="nofollow" target="_blank">R-universe</a> to gradually modernize parts of its infrastructure, while accommodating the project’s scale, governance, and established processes. In turn, Bioconductor is helping R-universe expand and refine its features as we learn to serve the complex needs of the Bioconductor community.</p>
<p>This collaboration reflects a core principle of R-universe as an R Consortium <a href="https://r-consortium.org/all-projects/" rel="nofollow" target="_blank">Infrastructure Steering Committee (ISC)</a> top-level project: supporting reviewed package repositories such as rOpenSci and Bioconductor, and providing modern, open, and reusable infrastructure that strengthens the broader R ecosystem.</p>
<section id="a-shared-mission-tooling-for-managed-repositories" class="level2">
<h2 class="anchored" data-anchor-id="a-shared-mission-tooling-for-managed-repositories">A Shared Mission: Tooling for Managed Repositories</h2>
<p>R-universe was designed as a next-generation package distribution and build system for R. It provides:</p>
<ul>
<li>Continuous building and checking of R packages across platforms<br>
</li>
<li>Binary packages for Windows, macOS, Linux, and WebAssembly<br>
</li>
<li>Transparent and reproducible build environments managed via GitHub actions<br>
</li>
<li>Dashboards and metadata APIs for monitoring ecosystem health and activity<br>
</li>
<li>CRAN-like package repositories with discoverable metrics and documentation</li>
</ul>
<p>From the outset, a key objective has been to support curated and reviewed communities — such as rOpenSci and Bioconductor — by offering modern infrastructure without requiring them to redesign their governance model or review processes.</p>
<p>For Bioconductor, this means incrementally introducing piece-wise functionality, with consideration for established release cycles and quality control mechanisms:</p>
<ol type="1">
<li>Setting up independent build and dashboard tooling, replicating processes from the current Bioconductor build systems on R-universe infrastructure</li>
<li>Mirroring Windows and macOS binaries produced on R-universe to Bioconductor</li>
<li>Exploring further integration of results and metadata produced by R-universe for Bioconductor health/activity monitoring and aiding the curation processes</li>
<li>Potential future steps toward deeper automation and harmonization</li>
</ol>
<p>By taking small gradual steps towards adopting R-universe components, everyone gets the opportunity to experiment with new tooling and evaluate where adjustments may be needed in order to minimize disruption to existing practices.</p>
<p>An important milestone in this venture is that Bioconductor now uses R-universe to build the Windows and macOS binaries, which significantly reduces costs and the maintenance load on the Bioconductor team. Beyond binary distribution, we are currently exploring deeper integration of R-universe’s continuous check results into Bioconductor’s quality control and release processes.</p>
</section>
<section id="two-universes-release-and-development" class="level2">
<h2 class="anchored" data-anchor-id="two-universes-release-and-development">Two Universes: Release and Development</h2>
<p>Bioconductor maintains two distinct repositories:</p>
<ul>
<li>A <strong>release</strong> branch for stable packages<br>
</li>
<li>A <strong>devel</strong> branch for ongoing development and the next release cycle</li>
</ul>
<p>To mirror this structure, we currently operate two dedicated R-universe instances:</p>
<ul>
<li><strong>Development branch:</strong> <a href="https://bioc.r-universe.dev/" rel="nofollow" target="_blank">https://bioc.r-universe.dev</a><br>
</li>
<li><strong>Release branch:</strong> <a href="https://bioc-release.r-universe.dev/" rel="nofollow" target="_blank">https://bioc-release.r-universe.dev</a></li>
</ul>
<p>These universes integrate directly with Bioconductor’s existing Git infrastructure and provide continuous builds for packages in both branches.</p>
<p>Through the R-universe dashboard, package maintainers and users can:</p>
<ul>
<li>Inspect cross-platform check results<br>
</li>
<li>Review extended BiocCheck diagnostics<br>
</li>
<li>Monitor build logs and dependency graphs<br>
</li>
<li>Explore rich package metadata and metrics<br>
</li>
<li>Publish binary packages for Windows, macOS, and Linux</li>
</ul>
<p>This provides a familiar yet modern interface for Bioconductor contributors, aligned with what users increasingly expect from contemporary R package infrastructure.</p>
<p>Information about each package is available on <code>https://bioc.r-universe.dev/{pkgname}</code>. For example, <a href="https://bioc.r-universe.dev/DESeq2" rel="nofollow" target="_blank">https://bioc.r-universe.dev/DESeq2</a> provides details on the DESeq2 package as shown below:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://i0.wp.com/docs.r-universe.dev/img/bioc-pkg.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="screenshot of r-universe" rel="nofollow" target="_blank"><img src="https://i0.wp.com/docs.r-universe.dev/img/bioc-pkg.png?w=578&#038;ssl=1" class="img-fluid figure-img" alt="screenshot of r-universe" data-recalc-dims="1"></a></p>
<figcaption>screenshot of r-universe</figcaption>
</figure>
</div>
<p>If this is your first time visiting R-universe, we recommend clicking the “Website Tour” button which will walk you through the most important information in 1 or 2 minutes.</p>
</section>
<section id="technical-documentation-for-bioconductor-maintainers" class="level2">
<h2 class="anchored" data-anchor-id="technical-documentation-for-bioconductor-maintainers">Technical Documentation for Bioconductor Maintainers</h2>
<p>The R-universe project maintains comprehensive technical documentation at <a href="https://docs.r-universe.dev/" rel="nofollow" target="_blank">https://docs.r-universe.dev</a>. For Bioconductor specifically, we created a dedicated section summarizing the most relevant topics for developers to get started with R-universe: <a href="https://docs.r-universe.dev/bioconductor/" rel="nofollow" target="_blank">https://docs.r-universe.dev/bioconductor/</a></p>
<p>As the collaboration evolves and new components get introduced, the documentation will continue to be expanded. The goal is to provide Bioconductor maintainers with a clear reference point for understanding how R-universe fits into their development workflow, while maintaining compatibility with the established practices that have made Bioconductor a successful project within the R community.</p>
</section>
<section id="looking-ahead" class="level2">
<h2 class="anchored" data-anchor-id="looking-ahead">Looking Ahead</h2>
<p>Adopting new infrastructure inevitably involves adjustments. For Bioconductor developers, integrating with a new build and distribution system will likely require some changes to workflows, and time to become familiar with new or different package checks, build diagnostics, and binary distribution.</p>
<p>However, by gradually moving toward common infrastructure, the Bioconductor project will benefit from improvements that are being continuously developed and maintained for the broader R ecosystem. A system based on modern continuous integration (CI) will provide developers with improved tooling, and will give the core team more time to focus on community coordination and quality control, rather than on maintaining costly infrastructure. At the same time, the shared platform provided by R-universe can help to increase the visibility and accessibility of Bioconductor software to the greater R community.</p>
<p>We look forward to continuing this alliance and to working with the Bioconductor community to ensure that the next generation of infrastructure supports the project for many years to come.</p>


</section>

<p>
© 2025 Bioconductor. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bioconductor.org/posts/2026-04-08-r-universe-collaboration/"> Bioconductor community blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/collaborating-between-bioconductor-and-r-universe-on-development-of-common-infrastructure-2/">Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</a>]]></content:encoded>
					
		
		<enclosure url="https://docs.r-universe.dev/img/bioc-pkg.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">400403</post-id>	</item>
		<item>
		<title>Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</title>
		<link>https://www.r-bloggers.com/2026/04/collaborating-between-bioconductor-and-r-universe-on-development-of-common-infrastructure/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/04/08/r-universe-bioc/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
For more than two decades, the Bioconductor project has been a cornerstone of the R ecosystem, providing high-quality, peer-reviewed tools for bioinformatics and computational biology. Its curated repository model, rigorous review standards, and tight...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/collaborating-between-bioconductor-and-r-universe-on-development-of-common-infrastructure/">Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/04/08/r-universe-bioc/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>For more than two decades, the <a href="https://www.bioconductor.org/" rel="nofollow" target="_blank">Bioconductor project</a> has been a cornerstone of the R ecosystem, providing high-quality, peer-reviewed tools for bioinformatics and computational biology. Its curated repository model, rigorous review standards, and tightly coordinated release process have helped establish Bioconductor as one of the most trusted distribution channels in scientific computing.</p>
<p>However, the infrastructure that supports such a long-standing and large-scale project inevitably accumulates technical debt. Legacy build systems, bespoke tooling, and historically grown workflows add up to costly and unsustainable maintenance work. For this reason, Bioconductor is collaborating with <a href="https://r-universe.dev/" rel="nofollow" target="_blank">R-universe</a> to gradually modernize parts of its infrastructure, while accommodating the project’s scale, governance, and established processes. In turn, Bioconductor is helping R-universe expand and refine its features as we learn to serve the complex needs of the Bioconductor community.</p>
<p>This collaboration reflects a core principle of R-universe as an R Consortium <a href="https://r-consortium.org/all-projects/" rel="nofollow" target="_blank">Infrastructure Steering Committee (ISC)</a> top-level project: supporting reviewed package repositories such as rOpenSci and Bioconductor, and providing modern, open, and reusable infrastructure that strengthens the broader R ecosystem.</p>
<h2>
A shared mission: Tooling for managed repositories
</h2><p>R-universe was designed as a next-generation package distribution and build system for R. It provides:</p>
<ul>
<li>Continuous building and checking of R packages across platforms</li>
<li>Binary packages for Windows, macOS, Linux, and WebAssembly</li>
<li>Transparent and reproducible build environments managed via GitHub actions</li>
<li>Dashboards and metadata APIs for monitoring ecosystem health and activity</li>
<li>CRAN-like package repositories with discoverable metrics and documentation</li>
</ul>
<p>From the outset, a key objective has been to support curated and reviewed communities — such as rOpenSci and Bioconductor — by offering modern infrastructure without requiring them to redesign their governance model or review processes.</p>
<p>For Bioconductor, this means incrementally introducing piece-wise functionality, with consideration for established release cycles and quality control mechanisms:</p>
<ol>
<li>Setting up independent build and dashboard tooling, replicating processes from the current Bioconductor build systems on R-universe infrastructure</li>
<li>Mirroring Windows and macOS binaries produced on R-universe to Bioconductor</li>
<li>Exploring further integration of results and metadata produced by R-universe for Bioconductor health/activity monitoring and aiding the curation processes</li>
<li>Potential future steps toward deeper automation and harmonization</li>
</ol>
<p>By taking small gradual steps towards adopting R-universe components, everyone gets the opportunity to experiment with new tooling and evaluate where adjustments may be needed in order to minimize disruption to existing practices.</p>
<p>An important milestone in this venture is that Bioconductor now uses R-universe to build the Windows and macOS binaries, which significantly reduces costs and the maintenance load on the Bioconductor team. Beyond binary distribution, we are currently exploring deeper integration of R-universe’s continuous check results into Bioconductor’s quality control and release processes.</p>
<h2>
Two Universes: Release and Development
</h2><p>Bioconductor maintains two distinct repositories:</p>
<ul>
<li>A <strong>release</strong> branch for stable packages</li>
<li>A <strong>devel</strong> branch for ongoing development and the next release cycle</li>
</ul>
<p>To mirror this structure, we currently operate two dedicated R-universe instances:</p>
<ul>
<li><strong>Development branch:</strong> <a href="https://bioc.r-universe.dev/" rel="nofollow" target="_blank">https://bioc.r-universe.dev</a></li>
<li><strong>Release branch:</strong> <a href="https://bioc-release.r-universe.dev/" rel="nofollow" target="_blank">https://bioc-release.r-universe.dev</a></li>
</ul>
<p>These universes integrate directly with Bioconductor’s existing Git infrastructure and provide continuous builds for packages in both branches.</p>
<p>Through the R-universe dashboard, package maintainers and users can:</p>
<ul>
<li>Inspect cross-platform check results</li>
<li>Review extended BiocCheck diagnostics</li>
<li>Monitor build logs and dependency graphs</li>
<li>Explore rich package metadata and metrics</li>
<li>Publish binary packages for Windows, macOS, and Linux</li>
</ul>
<p>This provides a familiar yet modern interface for Bioconductor contributors, aligned with what users increasingly expect from contemporary R package infrastructure.</p>
<p>Information about each package is available on <code>https://bioc.r-universe.dev/{pkgname}</code>. For example, <a href="https://bioc.r-universe.dev/DESeq2" rel="nofollow" target="_blank">https://bioc.r-universe.dev/DESeq2</a> provides details on the DESeq2 package as shown below:</p>
<div class="box" >
<figure   >
<div class="img">
<img  src="https://i0.wp.com/docs.r-universe.dev/img/bioc-pkg.png?w=578&#038;ssl=1" alt="screenshot of r-universe package" data-recalc-dims="1"/>
</div>
<a href="https://docs.r-universe.dev/img/bioc-pkg.png"  aria-disabled="true" rel="nofollow" target="_blank"></a>
</figure>
</div>
<p>If this is your first time visiting R-universe, we recommend clicking the “Website Tour” button which will walk you through the most important information in 1 or 2 minutes.</p>
<h2>
Technical Documentation for Bioconductor Maintainers
</h2><p>The R-universe project maintains comprehensive technical documentation at <a href="https://docs.r-universe.dev/" rel="nofollow" target="_blank">https://docs.r-universe.dev</a>. For Bioconductor specifically, we created a dedicated section summarizing the most relevant topics for developers to get started with R-universe: <a href="https://docs.r-universe.dev/bioconductor/" rel="nofollow" target="_blank">https://docs.r-universe.dev/bioconductor/</a></p>
<p>As the collaboration evolves and new components get introduced, the documentation will continue to be expanded. The goal is to provide Bioconductor maintainers with a clear reference point for understanding how R-universe fits into their development workflow, while maintaining compatibility with the established practices that have made Bioconductor a successful project within the R community.</p>
<h2>
Looking Ahead
</h2><p>Adopting new infrastructure inevitably involves adjustments. For Bioconductor developers, integrating with a new build and distribution system will likely require some changes to workflows, and time to become familiar with new or different package checks, build diagnostics, and binary distribution.</p>
<p>However, by gradually moving toward common infrastructure, the Bioconductor project will benefit from improvements that are being continuously developed and maintained for the broader R ecosystem. A system based on modern continuous integration (CI) will provide developers with improved tooling, and will give the core team more time to focus on community coordination and quality control, rather than on maintaining costly infrastructure. At the same time, the shared platform provided by R-universe can help to increase the visibility and accessibility of Bioconductor software to the greater R community.</p>
<p>We look forward to continuing this alliance and to working with the Bioconductor community to ensure that the next generation of infrastructure supports the project for many years to come.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/04/08/r-universe-bioc/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/collaborating-between-bioconductor-and-r-universe-on-development-of-common-infrastructure/">Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400401</post-id>	</item>
		<item>
		<title>EM-DAT, the world&#8217;s disaster memory, is at risk</title>
		<link>https://www.r-bloggers.com/2026/04/em-dat-the-worlds-disaster-memory-is-at-risk/</link>
		
		<dc:creator><![CDATA[R on Stats and R]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://statsandr.com/blog/em-dat-the-world-s-disaster-memory-is-at-risk/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>I do not usually write posts that are calls to action. But sometimes, something important enough comes along that it would feel wrong to stay silent. This is one of those times.</p>
<p>What is EM-DAT?<br />
EM-DAT, the Emergency Events Database, is the world’s...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/em-dat-the-worlds-disaster-memory-is-at-risk/">EM-DAT, the world’s disaster memory, is at risk</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://statsandr.com/blog/em-dat-the-world-s-disaster-memory-is-at-risk/"> R on Stats and R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>



<p><img src="https://i1.wp.com/statsandr.com/blog/em-dat-the-world-s-disaster-memory-is-at-risk/images/em-dat-the-world-s-disaster-memory-is-at-risk.jpg?w=578&#038;ssl=1" style="width:100.0%" data-recalc-dims="1" /></p>
<p>I do not usually write posts that are calls to action. But sometimes, something important enough comes along that it would feel wrong to stay silent. This is one of those times.</p>
<div id="what-is-em-dat" class="section level2">
<h2>What is EM-DAT?</h2>
<p><a href="https://www.emdat.be/" rel="nofollow" target="_blank">EM-DAT</a>, the Emergency Events Database, is the world’s most widely used and trusted global database for tracking natural and technological disasters. It has been maintained since 1988 by the <strong>Centre for Research on the Epidemiology of Disasters (CRED)</strong>, which is part of UCLouvain.</p>
<p>The database currently contains data on the occurrence and impacts of <strong>over 27,000 mass disasters</strong> worldwide, from 1900 to the present day. It covers floods, storms, earthquakes, droughts, wildfires, extreme temperatures, landslides, volcanic activity, and technological accidents, across virtually every country on earth.</p>
<p>Crucially, it is:</p>
<ul>
<li><strong>Open access</strong> (for non-commercial use)</li>
<li><strong>Globally comparable</strong>, using transparent and consistent inclusion criteria</li>
<li><strong>Cross-verified</strong> across multiple sources (UN agencies, NGOs, reinsurance companies, research institutes, press agencies)</li>
<li>The <strong>reference dataset</strong> for thousands of peer-reviewed studies, national risk assessments, and international policy processes</li>
</ul>
<p>If you have ever read a paper or report about global disaster trends, the probability is high that EM-DAT was the data source behind it.</p>
</div>
<div id="why-is-it-at-risk" class="section level2">
<h2>Why is it at risk?</h2>
<p>For more than 25 years, EM-DAT was primarily funded by the <strong>United States Agency for International Development (USAID)</strong>. Following the recent dismantling of USAID, that funding is gone, and no sustainable alternative has been secured.</p>
<p>This is not a minor budget shortfall. Without a replacement funding mechanism, EM-DAT risks shutting down entirely.</p>
</div>
<div id="why-does-it-matter" class="section level2">
<h2>Why does it matter?</h2>
<p>The <a href="https://openletter.earth/the-worlds-collective-disaster-memory-must-be-preserved-66c88c44" rel="nofollow" target="_blank">open letter</a> drafted in support of EM-DAT puts it well: in an era of intensifying climate extremes, cascading risks, and compounding crises, reliable data are not a luxury. They are the infrastructure for informed decision-making.</p>
<p>Concretely, EM-DAT underpins:</p>
<ul>
<li><strong>Disaster risk reduction and prevention policies</strong>, used by governments to assess national risks and prioritise investments</li>
<li><strong>Humanitarian operations</strong>, relied upon by multilateral agencies and NGOs to plan and forecast needs</li>
<li><strong>Climate research</strong>, providing historical baselines for understanding trends in extreme weather events</li>
<li><strong>Monitoring of global commitments</strong>, such as the Sendai Framework for Disaster Risk Reduction, the SDGs, and the Paris Agreement</li>
<li><strong>Insurance and risk modelling</strong>, used by the private sector alongside other data to benchmark losses and refine exposure models</li>
</ul>
<p>EM-DAT’s value is not just in the quantity of records. It lies in the <strong>rigour and consistency</strong> of its methodology over time and across countries. That is exactly what makes it irreplaceable. In a world awash with data, curated and quality-controlled datasets of this kind are rare. If EM-DAT were to close, the result would not be a smooth substitution. It would be fragmentation, proprietary data silos, and reduced access, particularly for lower-income countries that are already under-represented in global evidence.</p>
</div>
<div id="a-personal-note" class="section level2">
<h2>A personal note</h2>
<p>I signed the open letter after being informed of the issue by my colleague Prof. Niko Speybroeck, a leading epidemiologist at UCLouvain and program director of the CRED.</p>
<p>I do not have direct expertise in disaster epidemiology. But I do care about open data, open science, and the integrity of global research infrastructure. And EM-DAT is exactly the kind of resource that the whole scientific community relies on, often without fully realising it.</p>
</div>
<div id="how-you-can-help" class="section level2">
<h2>How you can help</h2>
<p>If you share these values, I encourage you to <a href="https://openletter.earth/the-worlds-collective-disaster-memory-must-be-preserved-66c88c44" rel="nofollow" target="_blank">sign the open letter: “The World’s collective disaster memory must be preserved”</a>.</p>
<p>The letter calls on governments, multilateral development banks, philanthropic foundations, and international organisations to step forward with a coordinated and sustainable funding arrangement for EM-DAT. The cost of maintaining the world’s primary disaster database is modest set against the billions spent on disaster response and recovery each year. The cost of losing it would be profound.</p>
<p>Please also consider sharing this post or the open letter with your own network (researchers, policymakers, students, practitioners, or anyone who cares about data-driven approaches to global challenges).</p>
</div>
<div id="more-information" class="section level2">
<h2>More information</h2>
<ul>
<li>EM-DAT website: <a href="https://www.emdat.be/" class="uri" rel="nofollow" target="_blank">https://www.emdat.be/</a></li>
<li>Open letter: <a href="https://openletter.earth/the-worlds-collective-disaster-memory-must-be-preserved-66c88c44" class="uri" rel="nofollow" target="_blank">https://openletter.earth/the-worlds-collective-disaster-memory-must-be-preserved-66c88c44</a></li>
<li>CRED at UCLouvain: <a href="https://www.uclouvain.be/en/research-institutes/irss/cred-epidemiology-of-disasters" class="uri" rel="nofollow" target="_blank">https://www.uclouvain.be/en/research-institutes/irss/cred-epidemiology-of-disasters</a></li>
</ul>
<p>As always, if you have any thoughts or questions related to this post, feel free to leave a comment below.</p>
</div>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://statsandr.com/blog/em-dat-the-world-s-disaster-memory-is-at-risk/"> R on Stats and R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/em-dat-the-worlds-disaster-memory-is-at-risk/">EM-DAT, the world’s disaster memory, is at risk</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400384</post-id>	</item>
		<item>
		<title>Marathon Man: how to pace a marathon</title>
		<link>https://www.r-bloggers.com/2026/04/marathon-man-how-to-pace-a-marathon/</link>
		
		<dc:creator><![CDATA[Stephen Royle]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 13:36:35 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://quantixed.org/?p=3654</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> How does the average marathoner pace their race? In this post, we’ll use R to have a look at a large dataset of marathon times to try to answer this question. The ideal strategy would be to “even split” the race. This is where you run continually at the ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/marathon-man-how-to-pace-a-marathon/">Marathon Man: how to pace a marathon</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://quantixed.org/2026/04/06/marathon-man-how-to-pace-a-marathon/"> Rstats – quantixed</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p><strong>How does the average marathoner pace their race?</strong> In this post, we’ll use R to have a look at a large dataset of marathon times to try to answer this question.</p>



<p>The ideal strategy would be to “even split” the race. This is where you run continually at the same pace from kilometre 0 to the finish. Let’s forget about “negative splitting”. This is where you speed up through the race, usually by running at a constant pace for the first half or three-quarters and then increasing the pace. Negative splits are for the pros not mere mortals! The difficulty with even-splitting the race is that it is very hard to know what pace you can maintain. The marathon gets hard for everyone after 30 km, so a slow down is almost inevitable. Certainly if you have started too fast you will <strong>fade</strong>. This situation is known as “positive splitting”.</p>



<p>Why is it so hard to know what pace you can maintain? Well, you can predict a pace based on existing races e.g. half marathon, and there are various ways to do this, but it is difficult to tell if you can hold that pace for the marathon. It’s such a brutal event that training up to run one takes time and it equally takes a while to recover, so experimentation is limited. Running a full marathon (at pace) in training, is not advised. So determining an ideal pace involves quite a bit of guesswork.</p>



<p>Let’s take a look at a big dataset of marathon times – we’ll use the New York City Marathon from 2025 – to see if we can understand how to pace a marathon. There’s an available dataset of chip times (meaning we don’t have to worry about dodgy GPS data) and the <a href="https://quantixed.org/2025/11/17/choose-your-fighter-data-driven-selection-of-the-best-marathon/" rel="nofollow" target="_blank">course</a> has similar first and second half profiles, allowing us to use these times to understand negative/even/positive splitting. Let’s dive in.</p>



<p>You can skip to <a href="https://quantixed.org/2026/04/06/marathon-man-how-to-pace-a-marathon/#the-code" rel="nofollow" target="_blank">the code</a> to play along or just see the analysis here.</p>



<figure data-wp-context="{"imageId":"69d40822e11a7"}" data-wp-interactive="core/image" data-wp-key="69d40822e11a7" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" fetchpriority="high" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_histogram-768x1024.png?w=450&#038;ssl=1" alt="" class="wp-image-3671" srcset_temp="https://i2.wp.com/quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_histogram-768x1024.png?w=450&#038;ssl=1 768w, https://quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_histogram-225x300.png 225w, https://quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_histogram.png 900w" sizes="(max-width: 768px) 100vw, 768px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>First we can see using histograms of the difference between second half and first half of the marathon, that <strong>most runners positive split the marathon</strong>. There are very few runners who run a negative-split (blue bars, left of the dashed line). More runners even-split (yellow), but the majority run positive (red) split times.</p>



<p>For marathoners with finishing in times of below 3 h, the modal split is only +2 minutes. Over 21.1 km this is only a loss of 6 s per km. For marathoners with finishes of over three hours, this loss gets more severe. Those finishing outside of 5 h, ship 20 minutes or more in the second half.</p>



<p>At first glance this looks like better pace management by the faster runners, but these positive splits could be proportional to the paces being run. In other words, a slower runner should ship more time in the second half, because they’re running more slowly.</p>



<figure data-wp-context="{"imageId":"69d40822e17a2"}" data-wp-interactive="core/image" data-wp-key="69d40822e17a2" class="wp-block-image size-full wp-lightbox-container"><img loading="lazy" decoding="async" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_scatter.png?w=450&#038;ssl=1" alt="" class="wp-image-3672" srcset_temp="https://i1.wp.com/quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_scatter.png?w=450&#038;ssl=1 1000w, https://quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_scatter-300x240.png 300w, https://quantixed.org/wp-content/uploads/2026/01/nyc_marathon_2025_split_difference_scatter-768x614.png 768w" sizes="(max-width: 1000px) 100vw, 1000px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>We can look at this data a different way and directly compare the first and second half times for each runner. Again this highlights just how few runners negative- or even-split the marathon. Most are positive splitting and are in the upper left half of the plot. We can also see that the data veers away from the ideal even-split (dashed line) with the slower paces. This veering looks linear (straight line).</p>



<p>We can fit a line to this data, and constrain it to go through (1,1) i.e. a 2 h marathoner even-splitting the race. To do this in R we can use <code>lm(formula = I(y - 60) ~ I(x - 60) + 0, data = fitting)</code> and this gives the coefficient for I(x – 60) as <strong>1.24</strong>. This is essentially the <strong>fade co-efficient</strong> for the average runner in the 2025 edition of this race.</p>



<p>What does that mean? Well, for a runner achieving a 90 minute first half, their second half would most likely be: 60 + 1.239 * (90 – 60) = 97.17 minutes, so this would be a finish time of 3:07:10.</p>



<p>For anyone looking to run a 3 h New York Marathon, the average runner would therefore need to run 60 / 2.239 + 60 = <strong>86.8</strong> minutes for the first half to anticipate the fade. So 1:26:48 for the first half, and then 1:33:12 for the second half.</p>



<p>A more simple calculation is to take the mean of the ratio between the two half times for everyone in the dataset. This gives a fade coefficient of <strong>1.13</strong>. The difference between these two fade co-efficients is due to the lack of constraint used in the fit. The ratio predicts a positive split being inevitable for the fastest runners, which is probably not true. Anyhow, this puts the first half time at 88 minutes for folks looking to run 3 h. These fade co-efficients are good predictors for a range of times, and I suspect would be similar at other marathon events with a similar profile. <strong>You can use them to calculate your ideal pace for a target finish time.</strong></p>



<p>Finally, for the most accurate answer about sub-3 h pacing, we can look directly at runners finishing between 02:50:00 and 03:00:00 and see what they actually ran. The median first half time was 86.3 min (IQR = 84.4 – 87.87) and the second half was 89.62 (88.07 – 91.12). This gives a median finish time of 2:56:00. So running a 1:26:18 first half would give someone their best chance of finishing in under 3 h, allowing for the inevitable fade.</p>



<p><strong>The takeaway message is: to finish within a goal time, do not assume even splits. </strong>That is, if you want to run 3 hours 30 min and bank on 90 minutes per half (4:59/km), you will most likely fail to hit the target. Build in a buffer of time to allow for the inevitable fade. A pace of 4:45/km is a better target pace (see below).</p>



<p>Good luck!</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td>Finish Time</td><td>Even split pace</td><td>Target pace</td></tr><tr><td>03:00:00</td><td>00:04:16</td><td>00:04:07</td></tr><tr><td>03:30:00</td><td>00:04:59</td><td>00:04:45</td></tr><tr><td>04:00:00</td><td>00:05:41</td><td>00:05:23</td></tr><tr><td>04:30:00</td><td>00:06:24</td><td>00:06:01</td></tr><tr><td>05:00:00</td><td>00:07:07</td><td>00:06:39</td></tr><tr><td>06:00:00</td><td>00:08:32</td><td>00:07:55</td></tr></tbody></table></figure>



<h2 class="wp-block-heading" id="the-code">The code</h2>



<p>This analysis was possible thanks to the uploader for making the chip time data available. Also, a shoutout to Nicola Rennie for <a href="https://nrennie.rbind.io/blog/adding-social-media-icons-ggplot2/" rel="nofollow" target="_blank">sharing</a> how to style social media handles in <code>{ggplot2}</code> graphics. This part of my code requires my <code>{qBrand}</code> library and should be skipped if you are running the code yourself (remove the <code>caption = cap</code> argument in the ggplot calls).</p>


<pre>
library(ggplot2)
library(ggtext)

sysfonts::font_add_google(&quot;Roboto&quot;, &quot;roboto&quot;)
showtext::showtext_auto()

## data wrangling ----

# load csv file from url
url &lt;- paste0(&quot;https://huggingface.co/datasets/donaldye8812/&quot;,
              &quot;nyc-2025-marathon-splits/resolve/main/&quot;,
              &quot;nyrr_marathon_2025_summary_56480_runners_WITH_SPLITS.csv&quot;)
df &lt;- read.csv(url)

# the data frame is a long table
# we need to grab the time values where splitCode is &quot;HALF&quot; or &quot;MAR&quot;
df &lt;- df[df$splitCode %in% c(&quot;HALF&quot;, &quot;MAR&quot;), c(&quot;RunnerID&quot;, &quot;splitCode&quot;, &quot;time&quot;)]
# reshape to wide format, values are in time
df &lt;- reshape(df, idvar = &quot;RunnerID&quot;, timevar = &quot;splitCode&quot;, direction = &quot;wide&quot;)
# calculate the split times in minutes
df$split_HALF &lt;- as.numeric(
  as.difftime(df$time.HALF, format = &quot;%H:%M:%S&quot;, units = &quot;mins&quot;))
df$split_MAR &lt;- as.numeric(
  as.difftime(df$time.MAR, format = &quot;%H:%M:%S&quot;, units = &quot;mins&quot;))
# calculate the second half time
df$split_SECOND_HALF &lt;- df$split_MAR - df$split_HALF
# remove rows with NA values
df &lt;- df[!is.na(df$split_SECOND_HALF), ]
# calculate the difference
df$Difference &lt;- df$split_SECOND_HALF - df$split_HALF
# difference as a fraction of first half
df$Difference_Fraction &lt;- df$Difference / df$split_HALF * 100
# classify into sub 3 hr, sub 4 hr, sub 5 hr, sub 6 hr, over 6 hr
df$Category &lt;- cut(df$split_MAR,
                           breaks = c(0, 180, 210, 240, 300, Inf),
                           labels = c(&quot;Sub 3 h&quot;, &quot;3:00-3:30&quot;, &quot;3:30-4:00&quot;,
                                      &quot;4:00-5:00&quot;, &quot;Over 5 h&quot;))

## plot styling ----

social &lt;- qBrand::qSocial()
cap &lt;-  paste0(
  &quot;**Data:** New York City Marathon 2025 Results&lt;br&gt;**Graphic:** &quot;,social
)

my_palette &lt;- c(&quot;Sub 3 h&quot; = &quot;#cb2029&quot;,
                &quot;3:00-3:30&quot; = &quot;#147f77&quot;,
                &quot;3:30-4:00&quot; = &quot;#cf6d21&quot;,
                &quot;4:00-5:00&quot; = &quot;#28a91b&quot;,
                &quot;Over 5 h&quot; = &quot;#a31a6d&quot;)

## make the plots ----

ggplot(df, aes(x = Difference, fill = after_stat(x))) +
  # vertical line at x = 0
  geom_vline(xintercept = 0, linetype = &quot;dashed&quot;, color = &quot;black&quot;) +
  geom_histogram(breaks = seq(
    from = -59.5, to = 81.5, by = 1), color = &quot;black&quot;) +
  scale_colour_gradient2(
    low = &quot;#2b83ba&quot;,
    mid = &quot;#ffffbf&quot;,
    high = &quot;#d7191c&quot;,
    midpoint = 0,
    limits = c(-15,15),
    na.value = &quot;#ffffffff&quot;,
    guide = &quot;colourbar&quot;,
    aesthetics = &quot;fill&quot;,
    oob = scales::squish
  ) +
  scale_x_continuous(breaks = seq(-45,90,15), limits = c(-40, 80)) +
  facet_wrap(~ Category, ncol = 1, scales = &quot;free_y&quot;) +
  labs(caption = cap) +
  labs(title = &quot;Most runners positive split the marathon&quot;,
       x = &quot;Difference in minutes (Second Half - First Half)&quot;,
       y = &quot;Number of Runners&quot;,
       caption = cap) +
  theme_classic() +
  # hide legend
  theme(legend.position = &quot;none&quot;) +
  theme(
    plot.caption = element_textbox_simple(
      colour = &quot;grey25&quot;,
      hjust = 0,
      halign = 0,
      margin = margin(b = 0, t = 5),
      size = rel(0.9)
    ),
    text = element_text(family = &quot;roboto&quot;, size = 16),
    plot.title = element_text(size = rel(1.2),
                              face = &quot;bold&quot;)
  )

ggsave(&quot;Output/Plots/nyc_marathon_2025_split_difference_histogram.png&quot;,
       width = 900, height = 1200, dpi = 72, units = &quot;px&quot;, bg = &quot;white&quot;)

ggplot() +
  geom_abline(slope = 1, linetype = &quot;dashed&quot;, color = &quot;black&quot;) +
  geom_point(data = df,
             aes(x = split_HALF, y = split_SECOND_HALF, colour = Category),
             shape = 16, size = 1.5, alpha = 0.1) +
  scale_x_continuous(breaks = seq(from = 0, to = 12 * 30, by = 30),
                     labels = seq(from = 0, to = 6, by = 0.5),
                     limits = c(1 * 60, 5 * 60)) +
  scale_y_continuous(breaks = seq(from = 0, to = 12 * 30, by = 30),
                     labels = seq(from = 0, to = 6, by = 0.5),
                     limits = c(1 * 60, 5 * 60)) +
  scale_colour_manual(values = my_palette) +
  labs(x = &quot;First half time (h)&quot;,
       y = &quot;Second half time (h)&quot;,
       caption = cap) +
  theme_bw() +
  theme(
    plot.caption = element_textbox_simple(
      colour = &quot;grey25&quot;,
      hjust = 0,
      halign = 0,
      margin = margin(b = 0, t = 10),
      size = rel(0.9)
    ),
    text = element_text(family = &quot;roboto&quot;, size = 16)
  ) +
  guides(colour = guide_legend(override.aes = list(alpha = 1)))

ggsave(&quot;Output/Plots/nyc_marathon_2025_split_difference_scatter.png&quot;,
       width = 1000, height = 800, dpi = 72, units = &quot;px&quot;, bg = &quot;white&quot;)
</pre>


<p>From this data we can also make some calculations to understand…</p>


<pre>
## fitting ----

# to fit, we&#039;ll constrain the line to go through (60,60), i.e. a
# 2 h marathoner who runs even splits
fitting &lt;- data.frame(x = df$split_HALF,y = df$split_SECOND_HALF)
lm( I(y-60) ~ I(x-60) + 0, data = fitting)


# Call:
#   lm(formula = I(y - 60) ~ I(x - 60) + 0, data = fitting)
# 
# Coefficients:
#   I(x - 60)  
# 1.239  

# so for a 90 minute first half, second half would be:
# 60 + 1.239 * (90 - 60) = 97.17 minutes, a finish time of 3:07:10

# to run a 3 h New York Marathon, the average runner needs to run
# 60 / 2.239 + 60 = 86.8 minutes for the first half
# so 1:26:48 for the first half, and 1:33:12 for the second half

# a more simple approach is to calculate the mean of the ratios
mean_ratio &lt;- mean(df$split_SECOND_HALF / df$split_HALF)
mean_ratio
# [1] 1.127581

# filter the df for finish times between 170 and 180 minutes
target &lt;- df[df$split_MAR &gt; 170 & df$split_MAR &lt; 180,]
summary(target)


    RunnerID         time.HALF           time.MAR           split_HALF      split_MAR     split_SECOND_HALF   Difference     
 Min.   :48819892   Length:1289        Length:1289        Min.   :70.25   Min.   :170.0   Min.   : 82.70    Min.   :-7.6833  
 1st Qu.:48834548   Class :character   Class :character   1st Qu.:84.42   1st Qu.:173.6   1st Qu.: 88.07    1st Qu.: 0.7167  
 Median :48849752   Mode  :character   Mode  :character   Median :86.30   Median :176.0   Median : 89.62    Median : 3.0500  
 Mean   :48849498                                         Mean   :85.98   Mean   :175.7   Mean   : 89.73    Mean   : 3.7585  
 3rd Qu.:48864551                                         3rd Qu.:87.87   3rd Qu.:178.2   3rd Qu.: 91.12    3rd Qu.: 5.9000  
 Max.   :48878979                                         Max.   :92.87   Max.   :180.0   Max.   :106.02    Max.   :35.7667  
 Difference_Fraction      Category   
 Min.   :-8.5008     Sub 3 h  :1289  
 1st Qu.: 0.8051     3:00-3:30:   0  
 Median : 3.5390     3:30-4:00:   0  
 Mean   : 4.5178     4:00-5:00:   0  
 3rd Qu.: 7.0055     Over 5 h :   0  
 Max.   :50.9134   
</pre>


<p>—</p>



<p>The post title comes from “Marathon Man” by Ian Brown from his “My Way” album. He’s wearing a track suit on the cover but that’s not optimal wear for running a marathon.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://quantixed.org/2026/04/06/marathon-man-how-to-pace-a-marathon/"> Rstats – quantixed</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/marathon-man-how-to-pace-a-marathon/">Marathon Man: how to pace a marathon</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400370</post-id>	</item>
		<item>
		<title>One interface, (Almost) Every Classifier: unifiedml v0.2.1</title>
		<link>https://www.r-bloggers.com/2026/04/one-interface-almost-every-classifier-unifiedml-v0-2-1/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/04/04/r/more-unifiedml-classifiers</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> A new version of `unifiedml` is out; available on CRAN. `unifiedml` is an effort to offer a unified interface to R's machine learning models.</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/one-interface-almost-every-classifier-unifiedml-v0-2-1/">One interface, (Almost) Every Classifier: unifiedml v0.2.1</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/04/04/r/more-unifiedml-classifiers"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>A new version of <code>unifiedml</code> is out; available on CRAN. <code>unifiedml</code> is an effort to offer a unified interface to R’s machine learning models.</p>

<p>The main change in this version <code>0.2.1</code> is the removal of <code>type</code> (of prediction) from <code>predict</code>, and the use of<code>...</code> instead, which is more generic and flexible.</p>

<p><strong>This post contains advanced examples of use of <code>unifiedml</code> for classification</strong>, with <code>ranger</code> and <code>xgboost</code>. More examples have been added to <a href="https://cloud.r-project.org/web/packages/unifiedml/vignettes/unifiedml-vignette.html" rel="nofollow" target="_blank">the package vignettes</a> too.</p>

<pre>install.packages(&quot;unifiedml&quot;)

install.packages(c(&quot;ranger&quot;))

library(&quot;unifiedml&quot;)

Loading required package: doParallel

Loading required package: foreach

Loading required package: iterators

Loading required package: parallel

Loading required package: R6
</pre>

<h1 id="1---ranger-example">1 &#8211; <code>ranger</code> example</h1>

<pre>library(ranger)



# 2 - 'ranger' classification ---------------------------

# -------------------------------
# S3 wrapper for ranger
# -------------------------------

# Fit function remains the same
my_ranger &lt;- function(x, y, ...) {
  if (!is.data.frame(x)) x &lt;- as.data.frame(x)
  y &lt;- as.factor(y)
  colnames(x) &lt;- paste0(&quot;X&quot;, seq_len(ncol(x)))
  df &lt;- data.frame(y = y, x)
  fit &lt;- ranger::ranger(y ~ ., data = df, probability = TRUE, ...)
  structure(list(fit = fit), class = &quot;my_ranger&quot;)
}

# Predict only with newdata
predict.my_ranger &lt;- function(object, newdata = NULL, newx = NULL, ...) {
  if (!is.null(newx)) newdata &lt;- newx
  if (is.null(newdata)) stop(&quot;No data provided for prediction&quot;)
#  misc::debug_print(newx)
#  misc::debug_print(newdata)
  if (is.matrix(newdata)) newdata &lt;- as.data.frame(newdata)
#  misc::debug_print(newdata)
  # Unconditionally rename to match training
  colnames(newdata) &lt;- paste0(&quot;X&quot;, seq_len(ncol(newdata)))
#  misc::debug_print(newdata)
  preds &lt;- predict(object$fit, data = newdata)$predictions
#  misc::debug_print(newdata)
  if (is.matrix(preds) &#038;&#038; ncol(preds) == 2) {
    lvls &lt;- colnames(preds)
    return(ifelse(preds[, 2] &gt; 0.5, lvls[2], lvls[1]))
  }

  preds
}

# Print method
print.my_ranger &lt;- function(x, ...) {
  cat(&quot;my_ranger model\n&quot;)
  print(x$fit)
}

# -------------------------------
# Example: Iris binary classification
# -------------------------------

set.seed(123)
iris_binary &lt;- iris[iris$Species %in% c(&quot;setosa&quot;, &quot;versicolor&quot;), ]
X_binary &lt;- iris_binary[, 1:4]
y_binary &lt;- as.factor(as.character(iris_binary$Species))

# Train/test split
train_idx &lt;- sample(seq_len(nrow(X_binary)), size = 0.7 * nrow(X_binary))
X_train &lt;- X_binary[train_idx, ]
y_train &lt;- y_binary[train_idx]
X_test &lt;- X_binary[-train_idx, ]
y_test &lt;- y_binary[-train_idx]

# Initialize and fit model
# Initialize model
mod &lt;- Model$new(my_ranger)

# Fit on training data only
mod$fit(X_train, y_train, num.trees = 150L)

# Predict on test set
preds &lt;- mod$predict(X_test)

# Evaluate
table(Predicted = preds, True =y_test)
mean(preds == y_test)  # Accuracy



# 5-fold cross-validation on training set
cv_scores &lt;- cross_val_score(
  mod,
  X_train,
  y_train,
  num.trees = 150L,
  cv = 5L
)

cv_scores
mean(cv_scores)  # average CV accuracy


            True
Predicted    setosa versicolor
  setosa         15          0
  versicolor      0         15
</pre>

<p>1</p>

<pre>  |======================================================================| 100%
</pre>

<style>
.list-inline {list-style: none; margin:0; padding: 0}
.list-inline>li {display: inline-block}
.list-inline>li:not(:last-child)::after {content: "\00b7"; padding: 0 .5ex}
</style>

<p><ol class=list-inline><li>1</li><li>1</li><li>1</li><li>1</li><li>1</li></ol></p>

<p>1</p>

<h1 id="2---xgboost-example">2 - <code>xgboost</code> example</h1>

<pre>library(xgboost)

my_xgboost &lt;- function(x, y, ...) {
  
  # Convert to matrix safely
  if (!is.matrix(x)) {
    x &lt;- as.matrix(x)
  }
  
  # Handle factors
  if (is.factor(y)) {
    y &lt;- as.numeric(y) - 1
  }
  
  fit &lt;- xgboost::xgboost(
    data = x,
    label = y,
    ...
  )
  
  structure(list(fit = fit), class = &quot;my_xgboost&quot;)
}

predict.my_xgboost &lt;- function(object, newdata, ...) {
  
  # Ensure matrix
  newdata &lt;- as.matrix(newdata)
  
  preds &lt;- predict(object$fit, newdata)
  
  # If binary classification → convert probs to class
  if (!is.null(object$fit$params$objective) &#038;&#038;
      grepl(&quot;binary&quot;, object$fit$params$objective)) {
    
    return(ifelse(preds &gt; 0.5, 1, 0))
  }
  
  preds
}

predict.my_xgboost &lt;- function(object, newdata = NULL, newx = NULL, ...) {
  
  # Accept both conventions
  if (!is.null(newx)) {
    newdata &lt;- newx
  }
  
  newdata &lt;- as.matrix(newdata)
  
  preds &lt;- predict(object$fit, newdata)
  
  # Binary classification → class labels
  if (!is.null(object$fit$params$objective) &#038;&#038;
      grepl(&quot;binary&quot;, object$fit$params$objective)) {
    
    return(ifelse(preds &gt; 0.5, 1, 0))
  }
  
  preds
}

print.my_xgboost &lt;- function(x, ...) {
  cat(&quot;my_xgboost model\n&quot;)
  print(x$fit)
}


set.seed(123)  # for reproducibility

# Binary subset
iris_binary &lt;- iris[iris$Species %in% c(&quot;setosa&quot;, &quot;versicolor&quot;), ]
X_binary &lt;- as.matrix(iris_binary[, 1:4])
y_binary &lt;- as.factor(as.character(iris_binary$Species))

# Split indices: 70% train, 30% test
train_idx &lt;- sample(seq_len(nrow(X_binary)), size = 0.7 * nrow(X_binary))
X_train &lt;- X_binary[train_idx, ]
y_train &lt;- y_binary[train_idx]
X_test &lt;- X_binary[-train_idx, ]
y_test &lt;- y_binary[-train_idx]

# Initialize model
mod &lt;- Model$new(my_xgboost)

# Fit on training data only
mod$fit(X_train, y_train, nrounds = 50, objective = &quot;binary:logistic&quot;)

# Predict on test set
preds &lt;- mod$predict(X_test)

# Evaluate
table(Predicted = preds, True =y_test)
mean(preds == y_test)  # Accuracy



# 5-fold cross-validation on training set
cv_scores &lt;- cross_val_score(
  mod, 
  X_train, 
  y_train, 
  nrounds = 50, 
  objective = &quot;binary:logistic&quot;, 
  cv = 5L
)

cv_scores
mean(cv_scores)  # average CV accuracy
</pre>

<p><img src="https://i1.wp.com/thierrymoudiki.github.io/images/2026-04-04/2026-04-04-image1.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/04/04/r/more-unifiedml-classifiers"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/one-interface-almost-every-classifier-unifiedml-v0-2-1/">One interface, (Almost) Every Classifier: unifiedml v0.2.1</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400351</post-id>	</item>
	</channel>
</rss>
